惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

Simon Willison's Weblog
Simon Willison's Weblog
P
Privacy International News Feed
www.infosecurity-magazine.com
www.infosecurity-magazine.com
T
Troy Hunt's Blog
Hacker News - Newest:
Hacker News - Newest: "LLM"
Attack and Defense Labs
Attack and Defense Labs
S
Secure Thoughts
V2EX - 技术
V2EX - 技术
cs.AI updates on arXiv.org
cs.AI updates on arXiv.org
O
OpenAI News
Cloudbric
Cloudbric
Google Online Security Blog
Google Online Security Blog
Schneier on Security
Schneier on Security
cs.CV updates on arXiv.org
cs.CV updates on arXiv.org
Help Net Security
Help Net Security
Cyberwarzone
Cyberwarzone
G
GRAHAM CLULEY
L
Lohrmann on Cybersecurity
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
Spread Privacy
Spread Privacy
NISL@THU
NISL@THU
N
News and Events Feed by Topic
T
Tenable Blog
S
Security @ Cisco Blogs
N
News and Events Feed by Topic
The Hacker News
The Hacker News
C
CXSECURITY Database RSS Feed - CXSecurity.com
宝玉的分享
宝玉的分享
月光博客
月光博客
酷 壳 – CoolShell
酷 壳 – CoolShell
美团技术团队
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Google DeepMind News
Google DeepMind News
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
T
Tailwind CSS Blog
V
Visual Studio Blog
P
Proofpoint News Feed
Webroot Blog
Webroot Blog
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
博客园 - 三生石上(FineUI控件)
cs.CL updates on arXiv.org
cs.CL updates on arXiv.org
Jina AI
Jina AI
雷峰网
雷峰网
T
The Blog of Author Tim Ferriss
Hugging Face - Blog
Hugging Face - Blog
腾讯CDC
L
LangChain Blog
The Register - Security
The Register - Security
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
博客园 - 聂微东

On a trail less travelled

Anandabazar Patrika, Unicode and proxy Mobile phone, internet and websites in Indian languages Unicode adoption for Bengali – The Change is happening Improved Firefox Padma for reading Anandabazar and Bartaman When you hit 34 degree Celsius below zero Virtual Keyboard Reading Anandabazar, Bartaman in Linux, Mac and the Bigger Issue Writing Unicode Bengali in LaTeX
English to Bengali Machine Translation System Anubadok is now 0.2
2008-07-09 · via On a trail less travelled

After a gap of almost two years, I am happy to announce the second official release (version 0.2.0) of Anubadok a free (as in freedom) machine translation system for English to Bengali. Anubadok is written in Perl and it uses Penn Treebank annotation system for natural language processing. To run Anubadok 0.2.0, you need to have Part-of-Speech tagger GPoSTTL installed in your system. The Anubadok system can be accessed online using the interface Anubadok Online run by Ankur.

First official release (ver. 0.1) of Anubadok was an experimental release which mainly served as a proof-of-concept for an open-source English to Bengali machine translation system.

With the release of version 0.2.0, I am glad to upgrade its official tag from “an experimental software” to “a software under development” with clear-and-specific implementation targets. However given the nature of the project, there are no specific time-frames for future releases. Further, given machine translation is considered an open research topic in Computational Linguistic, you should expect to see some surprises 😉 even for well implemented situations. Specially, if you are comparing results of machine translations with human translations.

In English, there are four types of sentences: Declarative, Imperative, Interrogative and Exclamatory. These sentence types further fall into four basic sentence type: Simple, Compound, Complex and Compound-Complex.

The table below gives approximate status of implementation for each sentence type in the current release and inversely it gives the targets for future implementations.

Status Table (Version: Anubadok-0.2.0 )
Declar. Imper. Interro. Exclam.
Simple W W W M
Compound M M M M
Complex N N N N
Compound – Complex N N N N

W: Well implemented
M: Moderately implemented
N: Not/Not-well implemented

Anubadok does not yet have any code to handle Complex or Compound-Complex sentences, not even moderately. This is where next push for development is needed.

Few other salient features of this release:

  • The execution method of Anubadok system has been re-written. Anubadok itself has been implemented as Perl module. This means one can now access Anubadok in a Perl program directly by including Anubadok libraries (Perl modules) or in any other program by using appropriate Perl module wrapper.
  • The notion of “testsuites” has been introduced for Anubadok. For a given English sentence, it compares a machine translated sentence with the expected Bengali sentence. This is quite an useful tool while adding new features or doing some experimentations as it would ensure that already implemented algorithm are not affected.
  • Anubadok system can now handle several kinds of input documents including plain text files, any XML documents, HTML files with in-line javascript, CSS. Further, as earlier, it is capable of translating Portable Object (PO) files directly.
  • Anubadok packaging has been completely reorganized to ensure that it has the basic structure of a standard Perl package. Consequently, Anubadok can be installed following the method of standard Perl module installation.
  • Anubadok-0.2.0 comes with an updated dictionary having 15K+ entries in its database. This is almost double the number of entries it had in 0.1 release. Credit for this goes to all the contributors of Ankur English to Bengali dictionary project. Anubadok’s dictionary are now updated regularly using database dumps of Ankur E2B dictionary.
  • Anubadok has now moved to its new website hosted by SourceForge.

    http://anubadok.sourceforge.net

    Latest source codes of Anubadok can be downloaded from the “trunk” branch of its SVN repository.

  • Anubadok Online, the online interface to Anubadok system, has been upgraded substantially. It runs directly using SVN version of Anubadok engine. User contributed new entries though this interface are submitted automatically to Ankur E2B dictionary project.
  • A brief document is now available for download as a PDF file from its website. It describes the internal working and the algorithm used by Anubadok system by considering specific example sentence.

Posted in Bengali Computing | Tagged , , , , | 116 Comments