Dimitris Poulopoulos

Machine Learning Engineer | Researcher

Learning Rate - Catching Coronavirus mutations using NLP

07 January 2021

Catching Coronavirus mutations using NLP

Natural Language Processing (NLP) has taken off during the second half of the previous decade. It continues to do so, with innovations like OpenAI’s DALL·E we discussed in the last edition of the Learning Rate. Like GPT-3, language models can generate coherent paragraphs of text, perform rudimentary reading comprehension, machine translation, question answering, and document summarization.

Recent breakthroughs are due to novel neural network architectures, called Transformers, which were introduced in 2017. Like Recurrent Neural Networks (RNN), the Transformers are designed to handle sequential data, making them the perfect model candidate for natural language.

Still, RNNs, specifically Long Short Term Memory (LSTM) cells, remain a powerful idea if applied thoughtfully. In a collaborative effort by scientists from MIT and Harvard, Brian Hie et al. used NLP to predict viruses mutations that could make those pathogens undetectable by our immune system. It is well known that viruses can mutate, but not all mutations are good at infecting a host. Thus, the authors of the paper Learning the language of viral evolution and escape, published in Science, had a brilliant idea: what if we view viruses’ mutations as modifying a sentence’s semantics by altering certain words? Moreover, what if we say that harmful mutations remain grammatically valid, while mutations that do not do a good job at infecting a host are invalid?

The team tested this approach to try and predict mutations of viruses like HIV and one coronavirus strain. Read the full story and how analogies like this can break new frontiers in the field on their excellent blog post.

This week on Medium

In the previous week, we examined the world of Visual Studio Code and how it can help you work remotely, launch and edit Jupyter Notebooks, and supercharge it with various extensions.

This week we examine how we can create and manage our own VS Code Server, to complete the cycle. Working from home is not the same as remote working. Remote working means working from any location, anywhere in the world, 24 hours a day. So, what if you could package VS Code alongside your project dependencies and create a truly flexible environment, fulfill the dream of remote working, or onboarding new team members without any hassle? Read more about this on Medium.


Learning Rate is a newsletter for those who are curious about the world of AI and MLOps. You’ll hear from me every Friday with updates and thoughts on the latest AI news and articles. Subscribe here for more content, like GitHub repos to star and book recommendations. Also, visit the resources page, to start building your own Data Science curriculum!

Image by Miroslava Chrienova from Pixabay