惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

WordPress大学
WordPress大学
Microsoft Security Blog
Microsoft Security Blog
Security Archives - TechRepublic
Security Archives - TechRepublic
V
Visual Studio Blog
宝玉的分享
宝玉的分享
IT之家
IT之家
人人都是产品经理
人人都是产品经理
T
The Blog of Author Tim Ferriss
I
InfoQ
B
Blog RSS Feed
T
Threatpost
博客园_首页
M
MIT News - Artificial intelligence
Spread Privacy
Spread Privacy
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Know Your Adversary
Know Your Adversary
U
Unit 42
Engineering at Meta
Engineering at Meta
C
Cyber Attacks, Cyber Crime and Cyber Security
月光博客
月光博客
Scott Helme
Scott Helme
T
Tor Project blog
有赞技术团队
有赞技术团队
AWS News Blog
AWS News Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
Last Week in AI
Last Week in AI
S
Schneier on Security
Vercel News
Vercel News
博客园 - Franky
C
Cybersecurity and Infrastructure Security Agency CISA
L
LINUX DO - 热门话题
NISL@THU
NISL@THU
L
LangChain Blog
爱范儿
爱范儿
Google DeepMind News
Google DeepMind News
The GitHub Blog
The GitHub Blog
雷峰网
雷峰网
Latest news
Latest news
C
CXSECURITY Database RSS Feed - CXSecurity.com
Hugging Face - Blog
Hugging Face - Blog
Exploit-DB.com RSS Feed
Exploit-DB.com RSS Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
G
GRAHAM CLULEY
S
Security Affairs
A
About on SuperTechFans
K
KPMG report finds enterprise disconnect between AI and its ROI | CIO
大猫的无限游戏
大猫的无限游戏
W
WeLiveSecurity
Cisco Talos Blog
Cisco Talos Blog
罗磊的独立博客

On a trail less travelled

Anandabazar Patrika, Unicode and proxy Mobile phone, internet and websites in Indian languages Unicode adoption for Bengali – The Change is happening Improved Firefox Padma for reading Anandabazar and Bartaman When you hit 34 degree Celsius below zero Virtual Keyboard Reading Anandabazar, Bartaman in Linux, Mac and the Bigger Issue Writing Unicode Bengali in LaTeX
English to Bengali Machine Translation System Anubadok is now 0.2
2008-07-09 · via On a trail less travelled

After a gap of almost two years, I am happy to announce the second official release (version 0.2.0) of Anubadok a free (as in freedom) machine translation system for English to Bengali. Anubadok is written in Perl and it uses Penn Treebank annotation system for natural language processing. To run Anubadok 0.2.0, you need to have Part-of-Speech tagger GPoSTTL installed in your system. The Anubadok system can be accessed online using the interface Anubadok Online run by Ankur.

First official release (ver. 0.1) of Anubadok was an experimental release which mainly served as a proof-of-concept for an open-source English to Bengali machine translation system.

With the release of version 0.2.0, I am glad to upgrade its official tag from “an experimental software” to “a software under development” with clear-and-specific implementation targets. However given the nature of the project, there are no specific time-frames for future releases. Further, given machine translation is considered an open research topic in Computational Linguistic, you should expect to see some surprises 😉 even for well implemented situations. Specially, if you are comparing results of machine translations with human translations.

In English, there are four types of sentences: Declarative, Imperative, Interrogative and Exclamatory. These sentence types further fall into four basic sentence type: Simple, Compound, Complex and Compound-Complex.

The table below gives approximate status of implementation for each sentence type in the current release and inversely it gives the targets for future implementations.

Status Table (Version: Anubadok-0.2.0 )
Declar. Imper. Interro. Exclam.
Simple W W W M
Compound M M M M
Complex N N N N
Compound – Complex N N N N

W: Well implemented
M: Moderately implemented
N: Not/Not-well implemented

Anubadok does not yet have any code to handle Complex or Compound-Complex sentences, not even moderately. This is where next push for development is needed.

Few other salient features of this release:

  • The execution method of Anubadok system has been re-written. Anubadok itself has been implemented as Perl module. This means one can now access Anubadok in a Perl program directly by including Anubadok libraries (Perl modules) or in any other program by using appropriate Perl module wrapper.
  • The notion of “testsuites” has been introduced for Anubadok. For a given English sentence, it compares a machine translated sentence with the expected Bengali sentence. This is quite an useful tool while adding new features or doing some experimentations as it would ensure that already implemented algorithm are not affected.
  • Anubadok system can now handle several kinds of input documents including plain text files, any XML documents, HTML files with in-line javascript, CSS. Further, as earlier, it is capable of translating Portable Object (PO) files directly.
  • Anubadok packaging has been completely reorganized to ensure that it has the basic structure of a standard Perl package. Consequently, Anubadok can be installed following the method of standard Perl module installation.
  • Anubadok-0.2.0 comes with an updated dictionary having 15K+ entries in its database. This is almost double the number of entries it had in 0.1 release. Credit for this goes to all the contributors of Ankur English to Bengali dictionary project. Anubadok’s dictionary are now updated regularly using database dumps of Ankur E2B dictionary.
  • Anubadok has now moved to its new website hosted by SourceForge.

    http://anubadok.sourceforge.net

    Latest source codes of Anubadok can be downloaded from the “trunk” branch of its SVN repository.

  • Anubadok Online, the online interface to Anubadok system, has been upgraded substantially. It runs directly using SVN version of Anubadok engine. User contributed new entries though this interface are submitted automatically to Ankur E2B dictionary project.
  • A brief document is now available for download as a PDF file from its website. It describes the internal working and the algorithm used by Anubadok system by considering specific example sentence.

Posted in Bengali Computing | Tagged , , , , | 116 Comments