DT
PT
Subscribe To Print Edition About The Tribune Code Of Ethics Download App Advertise with us Classifieds
search-icon-img
search-icon-img
Advertisement

Punjabi varsity to make archived Vidhan Sabha debates searchable

  • fb
  • twitter
  • whatsapp
  • whatsapp
Advertisement

Advertisement

Patiala, June 16

Punjabi University along with teams from IIT-Hyderabad and Centre for Development of Advanced Computing (CDAC) Noida, as part of a new project launched by the Punjab Government, will convert all Punjab Vidhan Sabha debates archived since 1947 into unicode searchable format. The government has entrusted the project to the university.

Advertisement

Project “OCRs and Applications in Indian Languages” will be carried out by the teams of the three institutions in collaboration. In fact, the teams are already engaged in digitisation and searchability augmentation of all Lok Sabha debates since 1947.

4L pages of images to be converted

  • Project to make archived Vidhan Sabha debates searchable will be carried out jointly by Pbi varsity, IIT-Hyderabad and CDAC, Noida
  • Debates can’t be searched as these exist in non-unicode font format; these will be converted into unicode searchable format
  • Advanced AI technologies, including optical character recognition and script recognition, will be used for conversion
  • Project entails conversion of over four lakh pages of archival images and will be completed within a year

Prof Gurpreet Singh Lehal said the archives of debates and resumes in the Vidhan Sabha exist as images or non-unicode font formats, rendering them unsuitable for search engine functionality.

Advertisement

“In order to enable searchability, it is necessary to convert these into textual form and transform the existing non-unicode text into unicode format. The project entails leveraging advanced artificial intelligence technologies, including optical character recognition (OCR) and script recognition, to convert the existing non-searchable images and non-unicode text into searchable formats,” he said.

The multilingual nature of debates, which encompass English, Punjabi, Hindi, and Urdu, presents significant challenges that necessitate the development of robust and highly accurate systems. “We have already developed high accuracy font converters for conversion of non-unicode text in Hindi and Punjabi to Unicode.

The project will be completed within a year, during which the teams will convert over four lakh pages of archival images of Vidhan Sabha debates and resumes from 1947 into searchable text.

Vice Chancellor Prof Arvind said the university’s initiative aligned with its commitment to scholarly pursuits and dissemination of knowledge. He said the project would significantly contribute to public accessibility of archives encompassing debates and resumes of Vidhan Sabha since 1947.

Advertisement
Advertisement
Advertisement
tlbr_img1 Classifieds tlbr_img2 Videos tlbr_img3 Premium tlbr_img4 E-Paper tlbr_img5 Shorts