Saturday, August 21, 2004

Balle Balle software

The new Punjabi word processor is here. It is user-friendly with a string of dream features, finds out Roopinder Singh

Hes made a Word of difference: Dr Gurpreet Singh Lehal
Hes made a Word of difference: Dr Gurpreet Singh Lehal

This is the story of one man's passion, dedication and focus. Many years ago, a young man decided to work on making Punjabi computer friendly. Initially, he developed an optical character reading (OCR) programme for Punjabi, and after much work, got one that had 97 per cent accuracy.

He was also involved with the effort to standardise Punjabi, develop a Unicode protocol for the language and now he is the man behind the new word processor in Punjabi, Akhar.

A powerful Punjabi word processor, which looks and feels like Microsoft Word but has a number of additional useful features, has been developed by Dr Gurpreet Singh Lehal, Professor in the Department of Computer Science and Engineering, Punjabi University, Patiala.

Built-in font converters, spell check, bilingual English-Punjabi dictionary, a sophisticated find/replace features, and a transliteration facility from Gurumukhi to Shahmukhi (the Urdu script used for writing Punjabi in Pakistan), the word processor comes with a string of dream features. All files made in it are Microsoft Word files.

"Till now, Indian-language software has been bedevilled by the problems of incompatible fonts, which is because of lack of standardisation. This software gives a way out with its font converter that supports more than 100 different Gurmukhi fonts and 32 keyboard layouts," says Prof Lehal, Director of the newly established Advanced Centre for Technical Development of Punjabi Language, Literature and Culture at the university.

He has 16 years of experience in teaching, research and software development. He worked for many summers developing the first OCR system for the Gurmukhi script. He also has the first Punjabi spell checker and the first Punjabi sorting utility to his credit.

The Gurmukhi OCR was tested at the Software Testing and Quality Control unit of the Department of Information Technology, Government of India, which certified its accuracy at 97 per cent, the best among all OCR systems of all other Indian languages, including Devnagri and Bangla tested by the STQC.

The software is impressive because it's practical. It sorts out the common problems encountered while inputting data. Many data-entry operators are only familiar with a particular layout, which can differ from font to font. In this case, the typist selects the keyboard he is familiar with and the font he wants to type in, and the word processor matches both seamlessly. This writer tested it and it worked well.

The onscreen keyboard also helps, especially since it has 25 of the commonly used words, which would be of use to Punjabi writers, students, translators, researchers, etc., who can refer to the screen for help. However, unlike the character-map accessories in Microsoft Windows, the letters do not change onscreen when you change the typeface; it remains static, which is not a bad thing, given the multiplicity of keyboards in Punjabi.

The dictionaries are another USP. The 1.5-lakh-word spell checker is Unicode/ISCII compliant. It supports popular Punjabi fonts and keyboards and there is a provision for additional dictionaries, which can be built by the users, besides a dictionary of all the words occurring in the Guru Granth Sahib. The incorporation of Punjabi-English and English-Punjabi dictionaries would be especially useful to translators or even someone who wants to just find out the meaning of a particular term. Just click a word to open the dictionary. Another click would replace the word in the text.

Lehal likes to demonstrate the powerful text analysis utility that performs quantitative analysis of text and generates word-frequency lists, character frequency lists and other statistics such as count of running words and unique in a text, token by type ratio, mean word length, percentage frequency of each word length etc, displaying the lists in alphabetical order, occurrence in text, frequency and word length. These can also be arranged in ascending or descending order. He uses this utility to find the 10 most frequently used words in Guru Granth Sahib or find the number of occurrences for each word.

Lehal and his team are now working on converting Gurumukhi text to Shahmukhi. This would surely help Punjabi transcend its two scripts, just as Akhar helps turn a computer into a powerful Punjabi literary tool.