惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

W
WeLiveSecurity
T
The Exploit Database - CXSecurity.com
C
CXSECURITY Database RSS Feed - CXSecurity.com
S
Security @ Cisco Blogs
T
Threat Research - Cisco Blogs
TaoSecurity Blog
TaoSecurity Blog
Recent Commits to openclaw:main
Recent Commits to openclaw:main
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
腾讯CDC
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
T
The Blog of Author Tim Ferriss
Microsoft Azure Blog
Microsoft Azure Blog
罗磊的独立博客
F
Full Disclosure
博客园 - 【当耐特】
C
CERT Recently Published Vulnerability Notes
Engineering at Meta
Engineering at Meta
Application and Cybersecurity Blog
Application and Cybersecurity Blog
T
Threatpost
I
Intezer
V2EX - 技术
V2EX - 技术
H
Hackread – Cybersecurity News, Data Breaches, AI and More
The Hacker News
The Hacker News
小众软件
小众软件
Google DeepMind News
Google DeepMind News
T
Tailwind CSS Blog
D
Darknet – Hacking Tools, Hacker News & Cyber Security
B
Blog RSS Feed
Microsoft Security Blog
Microsoft Security Blog
N
News | PayPal Newsroom
MyScale Blog
MyScale Blog
AI
AI
Vercel News
Vercel News
Spread Privacy
Spread Privacy
美团技术团队
CTFtime.org: upcoming CTF events
CTFtime.org: upcoming CTF events
The GitHub Blog
The GitHub Blog
V
Vulnerabilities – Threatpost
Schneier on Security
Schneier on Security
Cyberwarzone
Cyberwarzone
G
GRAHAM CLULEY
Help Net Security
Help Net Security
Hacker News: Ask HN
Hacker News: Ask HN
Google DeepMind News
Google DeepMind News
MongoDB | Blog
MongoDB | Blog
L
LINUX DO - 热门话题
U
Unit 42
L
LangChain Blog
Recent Announcements
Recent Announcements

On a trail less travelled

Anandabazar Patrika, Unicode and proxy Mobile phone, internet and websites in Indian languages Unicode adoption for Bengali – The Change is happening Improved Firefox Padma for reading Anandabazar and Bartaman When you hit 34 degree Celsius below zero Virtual Keyboard English to Bengali Machine Translation System Anubadok is now 0.2 Writing Unicode Bengali in LaTeX
Reading Anandabazar, Bartaman in Linux, Mac and the Bigger Issue
2008-07-03 · via On a trail less travelled

[Update: Please see at the bottom of this post for a link to an improved version of Padma.]

Anandabazar Patrika (ABP) and Bartaman Patrika (BP) are two (among big four) well-known Bengali news papers that are published from West Bengal, India. In the Internet era, their online versions are not just a matter of convenience rather the only route of access for many of us. Unfortunately, their online versions continue to live in the past by using non-standard, ancient dynamic font technology instead of upgrading to standard Unicode.

The worst part is that to view their website you need to have Internet Explorer installed in your machine. So if you are Linux, Mac (or any non-Windows) users then you are left at your own.

Fortunately, there is now a simple way for Firefox users in Linux and Mac to read these websites using a Mozilla extension named Padma by Nagarjuna Venna and his team. To get Padma working, (a) you need to have Unicode Bengali font (Linux users may already have one. Mac users can get one from Ekushey), (b) you need to have Firefox (version 3 is recommended for Linux but must have for Mac), and (c) and you need to install Padma.

Padma can transform given non-standard encoding to standard Unicode on the fly. Of course, for Padma to work, it must know the font-encoding of the particular website.

As it turns out, I wrote support for ABP in Padma more than a year ago. My job was made simple by an earlier CGI program by Tanmoy Bhattacharya who had already decoded font-mapping for ABP. Couple of months ago, I also added support for Bartaman Patrika in Padma. So, courtesy Tanmoy’s font-map decoding, latest version of Padma (0.4.13) supports both ABP and BP.

There is a known issue of incorrect rendering of Bengali Matras in certain situations. See for example Runa-Sankarshan’s photostream here. Many of these were due to a simple bug and has been fixed in the latest version (0.4.13). However, fixing of the remaining requires significant changes in Padma. ABP and BP both use three different fonts simultaneously. Most ligatures often come from 2nd and 3rd font whereas Matras come from the 1st font. Padma transforms each font separately and doesn’t merge these different fonts elements into a single element. This leads to the incorrect rendering which is hard to solve without changing the core of Padma.

The Bigger Issue however is the need for Padma itself. I tend to agree with the concerns expressed by Sankarshan in a discussion thread here. The real question is then how long are these websites going to keep themselves confined using their own non-standard encoding?

This led me to wonder: don’t their technical staffs realize what they are missing by not upgrading to Unicode? Firstly, by upgrading to Unicode they could readily expand their current user base. Secondly, the use of Unicode will make their contents search-able in search engine like Google. This could lead to additional search-engine generated revenue for them. The number of Bengali internet users is going to increase in coming future, and a significant portion of new internet users will be coming from the interior part. Undoubtedly, many of these users will be more comfortable in searching using Bengali keywords. Thirdly, by continuing the use of non-standard encoding, they are piling up their archive with non-standard contents which would require a big effort by them to bring into standard form. So, in my humble opinion, it would be prudent decision for them to upgrade their website to use Unicode sooner than later.

Nevertheless, there is now a positive sign that Star Ananda, a sister group of Anandabazar Patrika, has started using Unicode (though their defined “charset” doesn’t say so) for their Bengali website. I hope, this marks the beginning of change.

Update (May 9, 2009): Please see this post for an update on the above mentioned incorrect rendering issue.

Posted in Bengali Computing | Tagged , , , , , , , | 11 Comments