Machine Learning (ML) is an application or methodology to analyse input data and then predict an output value using statistical analysis. It is getting popular in finance and the investing industry. The basic idea of ML in investing is feeding all available data to a computer and let the algorithm learn the relationship between the data and stock price movement.
Traditionally, finance and economics data have been analysed statistically to find their relationships with KLCI. Recently, all other variables which are indirectly or not related to KLCI such as weather, traffic conditions, concert ticket sales, celebrity news and others, have been included in the ML algorithm.
Let’s do a simple experiment, using Google Trend data to predict Kuala Lumpur Composite Index (KLCI). Three search-terms – “Malaysia”, “1MDB”, and “KLCI” were selected. The popularity of each search-term over time were plotted together with KLCI. Chart 1 is search-term “Malaysia” and KLCI; Chart 2 is search-term “1MDB” and KLCI; while Chart 3 is search-term “KLCI” and KLCI.
Based on a cursory inspection, Charts 1 and 2 do not reveal
any strong relationship between search-term and KLCI movement. Although there was a sharp drop in KLCI when
the popularity of “1MDB” surged in Aug 2015, the subsequent surge did not move
KLCI drastically. Chart 3, on the other
hand, is more interesting as each time the popularity of search-term “KLCI”
peaked, the KLCI tend to reverse its downtrend movement.
Next, these data were then analysed using basic machine
learning algorithm. Generally, there are
two main types of machine learning used in quantitative finance – Regression,
and Classification. For simplicity
purpose, Classification method is chosen for this analysis (
Read
more here).
The KLCI data was transformed into “Up”, “Down”, “Flat”, and
“Dunno” by calculating the weekly closing price changes. Example, if week 2 closing price is higher
than week 1 closing price, week 2 will be classified as “Up”. The “Down”, and “Flat” were calculated
similarly. Additionally, the “Dunno”
category was introduced to eliminate noises for the region where no high search
popularity occurred.
A time lag effect was also introduced into the model to
“predict” whether KLCI will be “Up”, “Down”, “Flat”, or “Dunno” in the coming
week. As such, current week search-term
results will affect following week’s KLCI behaviour.
Several algorithms were tested and k-nearest neighbours
(KNN) algorithm was chosen as the accuracy is the highest amongst others. See Pictures 1 and 2 for details.
Picture 1.
Picture 2.
Now, let’s run a hypothetical test case to predict KLCI movement. In Test case 1, assuming the search-term popularity for “Malaysia”, “1MDB”, and “KLCI” are 2, 1, and 25 respectively. This means “Malaysia” and “1MDB” search traffics are almost flat but “KLCI” search traffic increased by 25%. The KNN algorithm predicted the KLCI will go down in the following week. In Test case 5, both “Malaysia and “1MDB” are almost flat but “KLCI” retreated from a high peak. The KNN algorithm predicated the KLCI will go up in the coming week. The machine learning algorithm is giving similar results as eye-balling observation. Table 1 shows KLCI movement predicted by KNN algorithm based on various test cases.
Above is just an illustrative example of how Google Trend and machine learning algorithm works. Actual algorithm trading requires more intensive research and data processing effort!