Patiala, June 16
Punjabi University along with teams from IIT-Hyderabad and Centre for Development of Advanced Computing (CDAC) Noida, as part of a new project launched by the Punjab Government, will convert all Punjab Vidhan Sabha debates archived since 1947 into unicode searchable format. The government has entrusted the project to the university.
Project “OCRs and Applications in Indian Languages” will be carried out by the teams of the three institutions in collaboration. In fact, the teams are already engaged in digitisation and searchability augmentation of all Lok Sabha debates since 1947.
4L pages of images to be converted
- Project to make archived Vidhan Sabha debates searchable will be carried out jointly by Pbi varsity, IIT-Hyderabad and CDAC, Noida
- Debates can’t be searched as these exist in non-unicode font format; these will be converted into unicode searchable format
- Advanced AI technologies, including optical character recognition and script recognition, will be used for conversion
- Project entails conversion of over four lakh pages of archival images and will be completed within a year
Prof Gurpreet Singh Lehal said the archives of debates and resumes in the Vidhan Sabha exist as images or non-unicode font formats, rendering them unsuitable for search engine functionality.
“In order to enable searchability, it is necessary to convert these into textual form and transform the existing non-unicode text into unicode format. The project entails leveraging advanced artificial intelligence technologies, including optical character recognition (OCR) and script recognition, to convert the existing non-searchable images and non-unicode text into searchable formats,” he said.
The multilingual nature of debates, which encompass English, Punjabi, Hindi, and Urdu, presents significant challenges that necessitate the development of robust and highly accurate systems. “We have already developed high accuracy font converters for conversion of non-unicode text in Hindi and Punjabi to Unicode.
The project will be completed within a year, during which the teams will convert over four lakh pages of archival images of Vidhan Sabha debates and resumes from 1947 into searchable text.
Vice Chancellor Prof Arvind said the university’s initiative aligned with its commitment to scholarly pursuits and dissemination of knowledge. He said the project would significantly contribute to public accessibility of archives encompassing debates and resumes of Vidhan Sabha since 1947.
Unlock Exclusive Insights with The Tribune Premium
Take your experience further with Premium access.
Thought-provoking Opinions, Expert Analysis, In-depth Insights and other Member Only Benefits
Already a Member? Sign In Now