Predicting Markets on Quantiacs using Machine Learning: A Ridge Regression Example
-
This article was published on The Startup on Medium: check it out here.
The new Quantiacs platform allows quants to download financial data for free. Predictions for markets can be performed offline, downloading locally the Quantiacs backtester, or online, using our cloud for free. In this article, we describe a supervised learning example based on Ridge regression.
When we built Quantiacs we focused on a platform that allows quants to perform realistic trading simulations without getting lost in more technical details. We believe that high-quality trading systems for global financial markets can be developed by talented research scientists, software developers and students who are not part of the quant hedge fund industry.
Among our best users, who are currently receiving a fee for their systems, we have indeed students from technical faculties like Alex, a Physics PhD student at UC Santa Barbara who got 1M USD in allocations when he won a Quantiacs contest, and Daniel, a mechatronics engineer and profi trader who got 2M USD in allocations as he won two competitions. More material can be found on our web page.
Are you a data scientist who wants to step into quantitative trading, or do you want to test if some ideas you have works? Then you should try Quantiacs.
Photo by Matthew Henry on UnsplashUsing Quantiacs
Joining Quantiacs is easy:
-
Register to the platform. As we value the privacy of our users, you should simply provide a username and a password. You can also sign in with your GitHub, Google or Facebook credentials if you prefer.
-
You will have a your disposal a private user area with a Development section. Here you can select one of the provided Examples and click on the Clone button. Afterwards you will be able to edit the code in the My Strategies section using Jupyter Notebook or JupyterLab on our cloud. You will be able to run each strategy on our cloud using 8G RAM and 1 Xeon E5–1650 core and you will have access online to all financial data we provide.
-
If you prefer to work locally on your workstation, you can download all financial data and our backtester locally. For the local installation we recommend using conda and reading the detailed instructions we provide on our page.
-
Once you are satisfied with your code, you can click on the Submit button in your user area and submit it to our servers for the live evaluation. We are currently running two contests (the 15th classic Quantiacs Futures contest and the 1st Bitcoin Futures contest), and the best 7 quants for each contest, ranked according to the live Sharpe ratio of their systems, will take part to the guaranteed allocation of 4 million USD. However, all submitted systems will be considered eligible for allocations provided they develop an interesting live track record.
The first submissions are arriving. You can check them on the Leaderboard page.
When we wrote the new version of the Quantiacs platform we placed particular care into delivering an environment suitable for using machine learning methods. In this article we will describe the implementation of a system which trades the Bitcoin Futures contract and is based on Ridge regression (or on the more general concept of Tikhonov regularization).
The Full Code
The full code is very compact and can be copied in the cell of a Notebook:
import xarray as xr import numpy as np import logging import qnt.data as qndata import qnt.backtester as qnbt def load_data(period): return qndata.cryptofutures.load_data(tail=period) def predict_weights(market_data): def get_ml_model(): from sklearn.linear_model import RidgeClassifier model = RidgeClassifier(random_state=18) return model def get_features(data): def preprocess_data(raw_prices): log_prices = raw_prices.copy(True) assets = log_prices.columns for asset in assets: log_prices[asset] = np.log(log_prices[asset]) return log_prices prices = data.sel(field="close").\ ffill("time").bfill("time").fillna(0) prices_df = prices.to_pandas() features = preprocess_data(prices_df) return features def get_target_classes(data): price_current = data.sel(field="close").dropna("time") price_future = price_current.shift(time=-1).dropna("time") class_positive = 1 class_negative = 0 target_up = xr.where(price_future > price_current,\ class_positive, class_negative) return target_up.to_pandas() data = market_data.copy(True) asset_names = data.coords["asset"].values features = get_features(data) target = get_target_classes(data) prediction_df = data.sel(field="close").isel(time=-1).to_pandas() for asset_name in asset_names: target_asset = target[asset_name] features_asset = features[asset_name][:-1] target_asset, features_asset = \ target_asset.align(features_asset, axis=0, join="inner") model = get_ml_model() try: model.fit(features_asset.values.reshape(-1, 1), target_asset) features_asset_predict = features[asset_name][-1:] prediction_asset = \ model.predict(features_asset_predict.values.reshape(-1, 1)) prediction_df[asset_name] = prediction_asset except: logging.exception("model failed") return xr.zeros_like(data.isel(field=0, time=0)) return prediction_df.to_xarray() weights = qnbt.backtest( competition_type = "cryptofutures", load_data = load_data, lookback_period = 18, start_date = "2014-01-01", strategy = predict_weights, analyze = True, build_plots = True )
Discussion
Let us analyze it block by block. At the very beginning we import basic general-purpose libraries and the modules we need from the Quantiacs library:
import xarray as xr import numpy as np import logging import qnt.data as qndata import qnt.backtester as qnbt
xarray is the open source project we chose for structuring our software: it is inspired and borrows from pandas and makes very efficient working with multi-dimensional arrays.
The function for loading the Bitcoin Futures data is simply defined by:
def load_data(period): return qndata.cryptofutures.load_data(tail=period)
More details on loading data can be found inspecting the source code of the Quantiacs library available on our GitHub repository and reading the Quantiacs documentation page.
The core of the code, where the strategy is defined, is contained in the block:
def predict_weights(market_data): def get_ml_model(): from sklearn.linear_model import RidgeClassifier model = RidgeClassifier(random_state=18) return model def get_features(data): def preprocess_data(raw_prices): log_prices = raw_prices.copy(True) assets = log_prices.columns for asset in assets: log_prices[asset] = np.log(log_prices[asset]) return log_prices prices = data.sel(field="close").\ ffill("time").bfill("time").fillna(0) prices_df = prices.to_pandas() features = preprocess_data(prices_df) return features def get_target_classes(data): price_current = data.sel(field="close").dropna("time") price_future = price_current.shift(time=-1).dropna("time") class_positive = 1 class_negative = 0 target_up = xr.where(price_future > price_current,\ class_positive, class_negative) return target_up.to_pandas() data = market_data.copy(True) asset_names = data.coords["asset"].values features = get_features(data) target = get_target_classes(data) prediction_df = data.sel(field="close").isel(time=-1).to_pandas() for asset_name in asset_names: target_asset = target[asset_name] features_asset = features[asset_name][:-1] target_asset, features_asset = \ target_asset.align(features_asset, axis=0, join="inner") model = get_ml_model() try: model.fit(features_asset.values.reshape(-1, 1), target_asset) features_asset_predict = features[asset_name][-1:] prediction_asset = \ model.predict(features_asset_predict.values.reshape(-1, 1)) prediction_df[asset_name] = prediction_asset except: logging.exception("model failed") return xr.zeros_like(data.isel(field=0, time=0)) return prediction_df.to_xarray()
The function “predict_weights” for predicting weights contains three auxiliary logical blocks:
-
The function "get_ml_model" where the machine learning model is defined. We chose the Ridge regression classifier from scikit-learn but any other classifier/library available in the Python ecosystem can be used. Your online environment contains an init Notebook for installing the needed libraries and solving dependencies using conda or pip. For offline development, we recommend to work in a dedicated Quantiacs environment and use conda for installing packages as described here.
-
The function “get_features” where the predictive features are defined. Here we are simply taking as a feature the logarithm of the close price of the various assets.
-
The function “get_target_classes” where we define the targets. Our choice is very simple: if the price of today is larger than the price of yesterday, we assign a label “1”, otherwise a label “0”. The label “1” will later be used for taking a positive exposure to the asset (going long), while the label “0” will be used for taking no position (closing the long position and investing everything in cash).
The rest of the function performs supervised learning: past data are inspected and for each value of the feature (the logarithm of the price of the asset) the realized value of the target (1 for a positive price move, otherwise 0) is inspected. The Ridge classifier is trained every day on the last available L days, where L=18 (lookback period in the backtesting function). After training, the model predicts the trading decision for the next day reading the new value of the feature and assigning a predicted target value 1 or 0. The training is updated every day on a rolling basis.
Finally we call the backtester function:
weights = qnbt.backtest( competition_type = "cryptofutures", load_data = load_data, lookback_period = 18, start_date = "2014-01-01", strategy = predict_weights, analyze = True, build_plots = True )
which computes the weights we assign to the Bitcoin Futures contract every day.
The performance of the system is quite good, as evident from the profit-and-losses and underwater chart:
The system has an in-sample Sharpe ratio larger than 1 and can be submitted to the contest. We provide a ready version in the Examples section of your account (Machine Learning-Ridge Classifier).
Note that we perform a correlation check on the submitted systems, and a carbon copy of the example will not pass our selection filters, as we do not allow to submit our templates. You can start modifying the system changing the parameters, the features, the targets or the machine learning model itself.
GitHub source code: https://github.com/quantiacs/strategy-cryptofutures-ml-ridge
Have questions? Use our Community Forum: https://quantiacs.com/community/
Or write us by mail: info@quantiacs.com
-