Problem Statement

Modern working life is abundant with text-based feeds such as: social media posts, news channels, customer emails, group chat channels, and internal company posts. Such feeds can provide invaluable insights and leads but these are often hidden amongst an ocean of irrelevant data. These ‘needles in a haystack’ very often follow some regular, albeit vague, patterns which allow them to be identified using machine learning algorithms.

Given labelled examples of relevant content, a Text Classification model can be trained to predict the relevance of future posts and highlight the most relevant ones to the user, freeing them up from the repetitive and time-consuming task of reviewing these text-based feeds.

Approach

The technology at the core of this system is a text classifier using word embeddings. The model is hosted and managed on the Google Cloud ML Engine which allows us to separate the model from the application logic, easily monitor performance, seamlessly retrain models, revert to previous versions in the event of polluting data, and easily integrate with other Google Cloud platform services.

Text Classification on text-based feeds

This machine learning model is supported by an application which:

1. Schedules the scrapping, storage, and prediction of new data

2. Schedules the retraining of the model

3. Sends notifications of relevant data to the user

4. Provides a labelling interface to the user

Similar Use Cases

Text classification is a core machine learning technique which has many use cases beyond that which was described above.

An example of a related problem which could be solved using a similar approach is priority classification on customer emails, making sure that the most urgent queries are dealt with first.