• A
  • A
  • A
  • ABC
  • ABC
  • ABC
  • А
  • А
  • А
  • А
  • А
Regular version of the site

Application of ML Techniques to Forecasting of Oil Prices. Model, Backtesting, Validation

Student: Dmitry Bagdasaryan

Supervisor: Vasily M. Solodkov

Faculty: HSE Banking Institute

Educational Programme: Master of Finance (Master)

Final Grade: 8

Year of Graduation: 2024

This study examines the application of machine learning methods for forecasting oil prices (using Crude oil as an example). A review of about 40 sources on various methods for forecasting oil prices was conducted, based on which preliminary data on the effectiveness of various models and their optimal parameters were obtained. Considering the Efficient Market Hypothesis (EMH), which states that the prices of financial assets at any moment fully reflect all available information, only historical price data without additional regressors are used for forecasting. The main machine learning model applied is a Recurrent Neural Network (RNN) model, specifically LSTM with added convolutional layers and attention mechanisms, based on the TensorFlow Python library. Based on the random walk theory (RWT), a probabilistic programming model from the PyMC Python library is used. The forecast is made based on daily prices for several planning horizons in working days: 1, 3, 5 (1 calendar week), 10, 25 (1 calendar month), 75 (1 calendar quarter), and 250 (1 calendar year). The obtained results are compared with forecasts obtained using econometric methods (SARIMA, ARIMA-GARCH, OLS), as well as using the VECM model for several financial assets cointegrated with Crude oil. Finally, we integrate and compare diverse forecasting methodologies to get the best prediction algorithm for each of the aforementioned horizons. We use two approaches: one involves combining different models to get an average or weighted prediction, and the other involves obtaining separate predictions from each model to evaluate the best model suitable for a certain horizon. The accuracy of forecasting is evaluated based on the metrics R2, RMSE, MAE, and MAPE. Since the original oil prices are non-stationary, several methods of transitioning from non-stationary to stationary time series are used to improve the accuracy and stability of predictions: logarithmization, differencing, and component extraction. Key outcomes of our research: 1. Different models demonstrate different performance across various horizons. 2. Machine learning models outperform traditional models applied in FHP due to their expertise in capturing complex patterns over longer-term horizons. 3. PCA components used for forecasting significantly improve the forecasting power of OLS and RNN models. 4. The best performance on a short horizon of 1-5 days is demonstrated by autoregressive OLS with 2-4 PCA components. 5. RNN provides the best metrics on the 10-day horizon. 6. Autoregressive OLS with 5-6 PCA components shows the best performance on the 25 and 75-day horizons. 7. The ARIMA-GARCH model provides the best performance among all models over the 250-day period but is close to the standard prediction based on the average. The key development of the proposed models is seen in the integration of macroeconomic and oil-related indicators to assess the impact of exogenous variables. We believe the results of our research are useful for practical implementations in different investment strategies and also serve as a basis for future research into various prediction algorithms.

Full text (added May 18, 2024)

Student Theses at HSE must be completed in accordance with the University Rules and regulations specified by each educational programme.

Summaries of all theses must be published and made freely available on the HSE website.

The full text of a thesis can be published in open access on the HSE website only if the authoring student (copyright holder) agrees, or, if the thesis was written by a team of students, if all the co-authors (copyright holders) agree. After a thesis is published on the HSE website, it obtains the status of an online publication.

Student theses are objects of copyright and their use is subject to limitations in accordance with the Russian Federation’s law on intellectual property.

In the event that a thesis is quoted or otherwise used, reference to the author’s name and the source of quotation is required.

Search all student theses