NLTK vs SKLearn vs Gensim vs TextBlob vs spaCy
Donal
·
2017-05-24
·
via 博客园 - Donal
Generally,
- NLTK is used primarily for general NLP tasks (tokenization, POS tagging, parsing, etc.)
- Sklearn is used primarily for machine learning (classification, clustering, etc.)
- Gensim is used primarily for topic modeling and document similarity.
Having said that, NLTK provides a nice wrapper for Sklearn's classifiers - nltk.classify package Combining Scikit-Learn and NTLK Python NLP - NLTK and scikit-learnAnd, to confuse you further, there also exist TextBlob: Simplified Text Processing and spaCy.io | Build Tomorrow's Language Technologies - aiming to give industry-ready NLP modules instead of NLTK, including a single quick algorithm for each of tokenization, POS tagging and parsing and word vectors for similarity calculation. I suggest that you mix and match, according to your needs. |
通常, NLTK主要用于一般NLP任务(标记化,POS标记,解析等) Sklearn主要用于机器学习(分类,聚类等) Gensim主要用于主题建模和文档相似性。 话虽如此,NLTK为Sklearn的分类器提供了一个很好的包装器 - nltk.classify包 结合Scikit-Learn和NTLK Python NLP - NLTK和scikit学习
而且,更为混淆的是,还有TextBlob:简化文本处理
和spaCy.io | 构建明天的语言技术 - 旨在提供行业准备的NLP模块而不是NLTK, 包括用于每个标记化,POS标记和解析的单个快速算法和用于相似性计算的字矢量。
我建议你根据你的需要混合搭配。
|
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。