Python Trending Weekly #35 (2024-01-13)
豌豆花下猫
Posted on January 13, 2024
Welcome to the Python Trending Weekly, a weekly newsletter about Python, AI and general programming techniques, with the majority links in English and a small portion in Chinese.
The original version of the weekly was written in Chinese. What you are reading here is mostly translated by LLMs.
Substack Channel : Click to subscribe
🦄Articles & Tutorials
- Python 3.13 gets a JIT
What is JIT (Just in Time)? How does it work? What are the benefits of Python + JIT? Copy-and-patch JIT is a design proposed in 2021, which is a high-speed algorithm specially designed for dynamic language runtime. Python 3.13 is expected to implement it! The first sharing of the last issue of the weekly was about it, and we will continue in this issue~
- NumPy 2 is coming: preventing breakage, updating your code
NumPy 2 is a major release scheduled for March/April 2024. It is a backwards incompatible release, so it is important to prepare in advance to ensure that our applications do not break. This article discusses the incompatible changes in the new version, how to make sure that you install the new version at the right time, and how to upgrade your code painlessly.
- The counter-intuitive rise of Python in scientific computing
The article raises a question: in the field of scientific computing that focuses on performance, Fortran used to be very popular, why is Python, which is slower, increasingly used now? The reason may be that people overestimate the importance of execution speed, and the agility and maintainability of programming are more important, and the performance of alternative solutions is not bad. (See also: Fortran community discussion.)
- Fast and Efficient Inequality Joins in Pandas
Pandas supports equi-joins using the merge and join functions, but what about inequality joins? This article presents two alternatives to the usual Cartesian product: using the pyjanitor library's conditional_join function, which is both memory-efficient and performant; and using DuckDB's SQL to query a DataFrame, which is extremely performant.
- Pandas Profiling: A Detailed Explanation
Pandas profiling is a popular library (renamed to ydata-profiling) that generates a profiling report of a dataset in just one line of code. This tutorial covers how it works, how to import and generate a report, how to analyze and handle sensitive data, how to profile big data, its alternatives, and its drawbacks.
- A Deep Dive Into Python's functools.wraps Decorator
Python decorators are one of my favorite features. When we define our own decorators, we need to consider the loss of metadata, and functools.wraps is the key. The article introduces its usage, how to use it, and how to pass custom arguments.
- SIMD in Pure Python
The author shares how to implement Game of Life in pure Python (with pysdl2 for graphics), running at 4K resolution at 180fps, which is ~3800x faster than the naive implementation.
- Best practices to protect your Flask applications
What are the best practices to secure your Flask project and protect it from security vulnerabilities? Based on the OWASP Top 10 most common vulnerabilities, this article introduces methods for loading JSON with yaml.safe_load, parsing XML with defusedxml, securing forms with flask_wtf, handling file paths with secure_filename, preventing XSS and CSRF, and nine recommendations for building secure APIs, among other things. It covers libraries such as Flask-SSLify, Flask-RESTful, Flask-HTTPAuth, Flask-JWT-Extended, and Flask-Limiter.
- Pushing real-time updates to clients with Server-Sent Events (SSEs)
Server-Sent Events (SSEs) is a way for a web server to send real-time information to a web page without the page having to repeatedly poll the server. This article provides a complete example of how to implement it in Python, and also points out its two limitations at the end.
- Building AI-powered TODO app
What would a TODO app look like in the era of AI? The author implemented a TODO project with Django + simple HTML + Whisper + mixtral-8x7b-instruct + SQLite, which is worth learning and referencing!
- PEP 736 – Shorthand syntax for keyword arguments at invocation
This PEP proposes to introduce a syntactic sugar f(x=) as a shorthand for f(x=x) when the name of the keyword argument and the value are the same. It is similar to f'{x=}' in f-string, and similar shorthands can be found in Ruby, JavaScript, and Rust. Statistically, this pattern accounts for 10-20% of keyword argument usage.
- How To Remove Background From Image In Python
This tutorial introduces how to remove the background of an image using Tkinter and rembg, and the effect is quite good.
🎁 Python Trending Weekly 🎁 organizes its content into seasons, with every 30 issues forming a season. The highlights from the first season have been compiled for your convenience. You can access them online here (Chinese).
🐿️Projects & Resources
- ydata-profiling: 1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames
Similar to Pandas' df.describe() function, ydata-profiling is very easy to use, providing extended analysis of a DataFrame with just one line of code, supporting output of the analysis report in formats such as html and json. (star 11.7K)
- pdfsyntax: A Python library to inspect and modify the internal structure of a PDF file
This is a lightweight library implemented in pure Python to inspect and modify the internal structure of a PDF file, supporting both CLI and API usage.
- harlequin: The SQL IDE for Your Terminal
Visualize your SQL in your terminal. (1.6k stars)
Call all LLM APIs using a unified OpenAI format. Works with Bedrock, Azure, OpenAI, Cohere, Anthropic, Ollama, Sagemaker, HuggingFace, Replicate, and 100+ other LLMs. (4.4k stars)
- Unstructured: Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines https://github.com/Unstructured-IO/unstructured
Preprocess unstructured data such as documents (PDF, HTML, WORD, etc.) and images, providing methods for partitioning, cleaning, staging, extracting, chunking, and embedding. (4.2k stars)
- chatgpt-on-wechat: A chatbot built on large models, supporting WeChat, WeChat Work, public accounts, and Feishu
You can choose GPT3.5/GPT4.0/Claude/Wenxin Yanyi/Xunfei Xinghuo/Tongyi Qianwen/Gemini/LinkAI, which can process text, voice, and images, access the operating system and the Internet, and support customization based on your own knowledge base Enterprise intelligent customer service. (star 19.9K)
- whisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)
An augmentation of the speech recognition model Whisper, with more accurate timestamps, multi-speaker diarization, and reduced hallucination via improved voice activity detection, all while being faster and having a smaller memory footprint. (7.4k stars)
- Python Programming Exercises, Gently Explained
A programming practice website that provides 42 small Python project exercises with solution analysis and reference answers.
- mealie: a self hosted recipe manager and meal planner
A recipe management project with a REST API backend and a reactive frontend built with Vue. Works on desktop, tablet, and mobile. Easily add recipes using a URL, user and group management. (4.3k stars)
- guardrails: Adding guardrails to large language models
What if large language models don't respond as expected? This library allows you to specify the output structure and type, verify and correct the output of large models, and improve the quality of the content. (2.7k stars)
- chainlit: Build Python LLM apps in minutes
Rapidly build web applications like ChatGPT, integrated with Langchain, Autogen, OpenAI Assistant, Llama, and Haystack. Customize the frontend to implement the full range of features, including monitoring and observability, authentication mechanisms, multi-tenancy, and seamless integration with various tools. (4.3k stars)
- functime: Time-series machine learning at scale. Built with Polars for embarrassingly parallel feature extraction and forecasts on panel data
Production-ready global forecasting and time-series feature extraction on large datasets. Supports time-series preprocessing, cross-validation splitters, and forecast metrics (MASE, SMAPE, etc.).
🥂Discussion & Questions
- Why does Python have the concept of a .venv virtual environment?
Why does Python need to use a virtual environment? Why does Python use this package management mechanism? What problems does package management software need to solve?
- How to call a Python project with third-party dependencies in Java?
When the API cannot be called, how can a Java project call a Python project? What are the problems with the JNI-CPython-Python solution? How to package it as EXE and so?
🐼Subscribe Welcome
- Blog: Explore my independent blog where you can find a collection of original/translated technical articles over the years, along with some reflections since 2009.
- Newsletter: Subscribe to my channel on Substack for a curated newsletter delivered straight to your inbox, keeping you updated on current affairs.
- Github: Access the Markdown source files of this weekly digest on Github and feel free to use them for anything you have in mind!
- Telegram: Beyond notifications for the weekly digest, I consider it an "extra edition," providing additional, more diverse information.
- Twitter: Follow me on Twitter where my feed is filled with numerous accounts of developers and organizations in the Python community.
Posted on January 13, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.