It is no surprise to us that many news headlines are worded in a way to lure you into clicking onto them. This year has brought about a flood of new retail investors into the crypto market space and if you’re one of them I can bet you have been trying to keep up with every news and information regarding it. One website you might be familiar with is Coinmarketcap.com. We will be performing a sentiment analysis on news headlines from Coinmarketcap.com to create a model that predicts whether Ethereum’s closing price will close higher or lower than the previous day.
In this article, we will explore the widely used technique when applying Natural Language Processing (NLP), known as Sentiment Analysis. With the growing nature of being connected to the web there is a large benefit to extract vital information regarding the opinions on the products or services provided by businesses. With the help of NLP we can extract information regarding the sentiment of the content you are reading. Using NLTK python library we will explore a built-in machine learning technique that determines the sentiment within the text being evaluated and classify them as positive, negative, or neutral. This allows businesses to be better informed of their audience to their service or products that they provide and build strategies to make improvements. I will be reviewing the popular sentiment analysis tools NLTK.Vader, and TextBlob to perform a sentiment analysis.
One of the most recognized for beginners, VADER( Valence Aware Dictionary and sEntiment Reasoner) is a rule-based and lexicon sentiment analysis tool that is created to extract sentiment from social media text. Quantifying the level of negativity and positivity found in a text.
To calculate the compound score of the entire text VADOR uses all of the known sentimental features, modifies them based on the rules. These values are summed up and normalized for a final score ranging from -1 to 1 using the following equation.
TextBlob is an NLP library in python that allows for NLP functionalities such as extracting sentiment from a text through high-level simple APIs. Build for the purpose to be user-friendly. The values we will be examining will be Subjectivity and Polarity.
Subjectivity quantifies the amount of personal opinion and factual information found in the text. The higher the value suggests the test contains more personal opinion than factual information.
Polarity quantifies the amount of positive and negative sentiment found in the text, where 1 means a positive statement and -1 means a negative statement.
To fully understand our sentiment analysis, I decided to graph the sentiment features vs the dates we are examining. Grouping these features together we can start to see the difference between these features.
Both polarity and compound features are similar quantizes but are calculated differently. In this exercise you can see the compound score is capturing more negativity in certain days. You can also see that the subjectivity for these days are below 50% suggesting these headlines are providing more factual information than opinions.
Stay tuned to my next article where I will be examining which sentiment feature is most important in determining if Ethereum closes higher or lower than the previous day.