Ethereum Close Price Prediction Model: Random Forest & XGBoost

Let us continue from where we left off on my last article, where I conducted a sentiment analysis on news headlines for the cryptocurrency Ethereum. In this article the goal is to try and determine if our sentiment analysis will produce a successful model that can predict if the market will close higher or lower than the day before. Let us start off with looking at the dataset I created.

I decided to create a class that labels if the current closing price closed higher or lower than the previous day with values 1 and 0 to identify them, respectively.

Let us start by creating X and y values followed by setting up a train and test set. In this example, I will be performing an 80/20 split.

For this article I will be focusing on two models:

The first model we will fit and analysis will be a Random Forest.

The random forest classifier is a supervised learning algorithm that are used for classification and regression problems. It consists of multiple decision trees, using randomness to increase accuracy and avoid overfitting.

Some of the application this algorithm is known for are recommendation engines, image classification, feature selection, predict diseases, and classify loyal loan applicants. It is designed to be an ensemble method where trees are generated on randomly split dataset. Then, these individual trees are generated using attribute selection indicator such as information gain, gain ratio, and Gini index of each attribute.

The last model we will fit will be a XGBClassifer

The eXtreme Gradient Boosting aka XGBoost is an optimized distributed gradient boosting library designed to be highly efficient, flexible, and portable. It is a better regularization technique to reduce overfitting. It accomplishes this by being a boosting algorithm on gradient boosted decision trees algorithm. Gradient boosting creates new models that predict the residuals or errors of prior models then adds them together to make the final prediction.

XGBoost does extremely well on structured or tabular datasets on classification and regression predictive modeling problems. It is specifically used for two reason which are execution speed and model performance.

Model 1: Random Forest

  • 1 indicates the Ethereum market price closes higher than the day before
  • 0 indicates the Ethereum market price closes higher than the day before

The model does pretty decent sitting at a 65% accuracy value. With the sentiment we extracted from news headlines and the unique ids given to every news center that were published on Coinmarketcap I was able to build a strong foundation for this classifier.

One of the unique features that both models have, is the ability at arranging the importance of each feature

Feature importance:

Random forest uses the mean decrease in impurity to quantify the importance of features. In this case the features id, subjectivity and polarity are the most significant in making decisions throughout the model.

Model 2: XGBooster

As you can see, the accuracy of the model has increased by 2%. You also see a slight increase in all 3 parameters we use to evaluate the performance of the model.

f0:id | f1:subjectivity | f2:polarity | f4:positive | f6:nuetral | f3:comp | f5:negative

XGBoost uses the F score to arrange the importance of each feature. Once again you can see the top 3 features id, subjectivity and polarity having the largest weight over the decision on the class.

--

--

Hi! My name is Eric Gustavo Romano. I am a data science enthusiast and practitioner located in Jersey.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Eric Gustavo Romano

Hi! My name is Eric Gustavo Romano. I am a data science enthusiast and practitioner located in Jersey.