Natural language processing (NLP) is an exciting branch of artificial intelligence (AI) that allows machines to break down and understand human language. As a data scientist, I often use NLP techniques to interpret text data that I'm working with for my analysis. During this tutorial, I plan to walk through text pre-processing techniques, machine learning techniques and Python libraries for NLP.
Text pre-processing techniques include tokenization, text normalization and data cleaning. Once in a standard format, various machine learning techniques can be applied to better understand the data. This includes using popular modeling techniques to classify emails as spam or not, or to score the sentiment of a tweet on Twitter. Newer, more complex techniques can also be used such as topic modeling, word embeddings or text generation with deep learning.
We will walk through an example in Jupyter Notebook that goes through all of the steps of a text analysis project, using several NLP libraries in Python including NLTK, TextBlob, spaCy and gensim along with the standard machine learning libraries including pandas and scikit-learn.