How to Generate Text in Python

Text is my personal favorite medium for machine learning. Here is why: In computing, a picture is worth a (few hundred) thousand words. As a result, modeling text is more space and compute efficient than visual models. Text arrived first to the internet. This lead time has resulted in better algorithms, and bottomless data. Interpretability…

Read More


A Beginner’s Guide to Topic Modeling in Python

Since the launch of its public beta in 2008, Glassdoor has become the gold standard for company satisfaction ratings. They also have a classifieds section comparable to LinkedIn, Indeed, and others. Glassdoor provides examples of job descriptions with company ratings in context. This makes it easy to compare what good and bad companies have in…

Read More


Exploring Latent Word Vectors using Path Finding

Welcome to my latest post where we dive into the exciting world of natural language processing! One of the key tasks in NLP is understanding the relationships between words. In my previous post, I created a product recommendation system using word embeddings. Today, we’ll take it a step further and explore how we can use…

Read More


Tokenization in Python Using SentencePiece

What is tokenization? Tokenization involves breaking text into individual words, making it easier for computers to understand and analyze meaning. This task applies to various Natural Language Processing (NLP) applications such as language translation, text summarization, and sentiment analysis. In this post, we will explore using SentencePiece, a widely used open-source library for tokenization in…

Read More