Text is my personal favorite medium for machine learning. Here is why: In computing, a picture is worth a (few hundred) thousand words. As a result, modeling text is more space and compute efficient than visual models. Text arrived first to the internet. This lead time has resulted in better algorithms, and bottomless data. Interpretability…

# Tag: Natural Language Processing

## Exploring Latent Word Vectors using Path Finding

In my previous post, I created a product recommendation system using word embeddings. Today, we’ll take it a step further and explore how we can use these vectors to find the shortest path between pairs of words. Dijkstra’s algorithm Dijkstra’s algorithm is a method for finding the shortest path between any two vertices of a…

## Tokenization in Python Using SentencePiece

What is tokenization? Tokenization involves breaking text into individual words, making it easier for computers to understand and analyze meaning. This task applies to various Natural Language Processing (NLP) applications such as language translation, text summarization, and sentiment analysis. In this post, we will explore using SentencePiece, a widely used open-source library for tokenization in…