Modeling Glassdoor job descriptions as a Bag-of-Words

Since the launch of its public beta in 2008, Glassdoor has become the gold standard for company satisfaction ratings. They also have a classifieds section comparable to LinkedIn, Indeed, and others. Glassdoor provides examples of job descriptions with company ratings in context. This makes it easy to compare what good and bad companies have in…

Read More


Applying A* path finding to latent word vectors

Path finding algorithms There are many algorithms for finding the shortest path between two points, but few are better known than Dijkstra’s algorithm and the A* algorithm. Dijkstra’s algorithm Dijkstra’s algorithm is often described as being “shortest-path first”. This is a greedy approach that isn’t suitable for large graphs. Source: wikimedia.org A* search algorithm A*,…

Read More


Building a recommendation system in Python

For competitive, low-margin businesses, a store’s layout can be the difference between surviving and getting wiped out. To drive more sales, businesses are using recommendation systems in online stores, and data-driven nudging at brick-and-mortar locations. How can purchasing data be turned into sales? We can use an unaltered version of the word2vec algorithm used in…

Read More


Word segmentation in Python using SentencePiece

What is word segmentation? Word segmentation (also called tokenization) is the process of splitting text into a list of words. Humans can do this pretty easily, but computers need help sometimes. At a higher level, you can think of segmentation as a way of boosting character-level models that also makes them more human-interpretable. Setup First,…

Read More