Text generation in Python

Text is my personal favorite medium for machine learning. Here is why: In computing, a picture is worth a (few hundred) thousand words. As a result, modeling text is more space and compute efficient than visual models. Text arrived first to the internet. This lead time has resulted in better algorithms, and bottomless data. Interpretability…

Read More


Writing better job postings using Glassdoor data

Since the launch of its public beta in 2008, Glassdoor has become the gold standard for company satisfaction ratings. They also have a classifieds section comparable to LinkedIn, Indeed, and others. Glassdoor provides examples of job descriptions with company ratings in context. This makes it easy to compare what good and bad companies have in…

Read More


Bridging concepts by applying pathfinding to word vectors

Society has unlocked vast amounts of value by harnessing the power of the graphs. First, we used graphs as infrastructure for running water, electricity, and the internet. Then, we used graphs as platforms for further applications. The refrigerator and washing machine are second-layer graph technologies, as are Google and Facebook. The networks people are looking…

Read More


Word segmentation in Python

What is word segmentation? Word segmentation (also called tokenization) is the process of splitting text into a list of words. Humans can do this pretty easily, but computers need help sometimes. At a higher level, you can think of segmentation as a way of boosting character-level models that also makes them more human-interpretable. Setup First,…

Read More