惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

F
Full Disclosure
WordPress大学
WordPress大学
小众软件
小众软件
Cloudbric
Cloudbric
AWS News Blog
AWS News Blog
腾讯CDC
量子位
人人都是产品经理
人人都是产品经理
大猫的无限游戏
大猫的无限游戏
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
V
Vulnerabilities – Threatpost
Scott Helme
Scott Helme
Hugging Face - Blog
Hugging Face - Blog
博客园_首页
C
CXSECURITY Database RSS Feed - CXSecurity.com
The Hacker News
The Hacker News
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
IT之家
IT之家
Jina AI
Jina AI
Attack and Defense Labs
Attack and Defense Labs
S
SegmentFault 最新的问题
Simon Willison's Weblog
Simon Willison's Weblog
The Cloudflare Blog
阮一峰的网络日志
阮一峰的网络日志
T
Tailwind CSS Blog
Last Week in AI
Last Week in AI
博客园 - 【当耐特】
Google Online Security Blog
Google Online Security Blog
美团技术团队
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
V
Visual Studio Blog
罗磊的独立博客
L
LINUX DO - 最新话题
博客园 - Franky
博客园 - 叶小钗
Apple Machine Learning Research
Apple Machine Learning Research
The Last Watchdog
The Last Watchdog
J
Java Code Geeks
AI
AI
C
Cisco Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
C
Cyber Attacks, Cyber Crime and Cyber Security
Cisco Talos Blog
Cisco Talos Blog
博客园 - 三生石上(FineUI控件)
雷峰网
雷峰网
Help Net Security
Help Net Security
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
云风的 BLOG
云风的 BLOG
I
Intezer
S
Securelist

Keenformatics

Using Tox with Poetry dependency groups Ports blocked by browsers Manually verifying an SSL certificate How to generate documentation with Sphinx How To Override Sublime Text Packages Shortcuts and Preferences How To Find out Sublime Text Key Binding Commands How To Make Terminator Behave Like Guake (Ubuntu) Sentiment Analysis lexicons and datasets Model-Driven approach vs hardcore coding
PyLaDe
fievelk · 2023-03-23 · via Keenformatics

I was recently having a look at my GitHub repositories page. I sometimes tidy up my projects as I would do with my living room, trying to understand what a guest would think of me based on what I keep around. It’s interesting to realize how far I’ve gotten with respect to some of my old projects. Still, one of them caught my attention: I hadn’t noticed before, but some users starred it and even forked it over the time.

I don’t mean a lot of people, but I still feel glad thinking that someone used this code despite the fact that I had never advertised it or talked about it, anywhere. Thinking about it, I know programmers who spend weeks at work writing code that is never once used by users. And it’s not like this was my first open source project: I was lucky to be an active and enthusiastic member of the NLTK team, so I have experienced the feeling of contributing to a much larger community (exciting!). Still, these GitHub stars for my little project feel extremely rewarding, probably because I wasn’t expecting them and they come from an old time, before I relocated to another country and kindly nudged my life into new directions.

PyLaDe's GitHub repository
PyLaDe's GitHub repository

The project is called PyLaDe (Python Language Detection). It’s a simple command-line interface for language detection, with some additional features to train and test custom models. The interesting part (for me) is that I spent quite some time refining it at the time, as I wanted to fiddle a bit with Python packaging and documentation. It looks like someone appreciated the effort, so I decided to update it a little bit and bring it up-to-speed with today’s practices.

The first thing to do was to remove support for old Python versions and add support for new ones. I expanded Tox tests to run against the latest versions, and at this point I realized that the project’s dependencies were outdated. Rather than just modifying the requirements.txt file, I decided to use Poetry and modernize part of the stack.

Adding Poetry and its pyproject.toml configuration meant getting rid of the old setup.py. This has been interesting, since it allowed me to also get an historical look over decades of Python’s best practices and community questions. Besides enjoying this little paleo-programming research, I got to appreciate the elegance of Poetry as a new-ish tool, as well as the simplicity of the good ol’ ways.

After these changes, it was time to brush up on Semantic Versioning and finally add tags for the project’s releases. One of the biggest complaints I had about the old version of the project is that it could only be installed locally, after cloning the repository from GitHub. That’s why I decided to go full-in this time and upload PyLaDe on PyPi: the installation is now as easy as it gets:

I bumped some minor versions along the way and enjoyed seeing my creature evolve.

One unexpectedly tricky part was finding a good way to automatically generate documentation using Sphinx and upload it to GitHub Pages. As odd as it sounds, Sphinx doesn’t yet provide any straightforward way to build and publish on GitHub Pages (see this issue). I could find several GitHub actions that offer a similar workflow, but they always feel like an overkill. In the end I decided to just add a few lines to the Sphinx Makefile to build the docs and move them to the docs/ folder (the one used by GitHub Pages) in the gh-pages branch.

The project is far from being “optimal” and it’s mostly intended for study purposes. It’s also quite fun to think about it today, in the era of Large Language Models and their capabilities. Still, it provided me with some weekend fun and learning. In case someone else enjoys it and wants to use it to study the Cavnar-Trenkle N-Gram-based approach for text categorization, feel free to star the project: I’ll be grateful.