Q18 Supervised Learning¶
This template shows how to make a submission to the Q17 cryptocurrency contest using supervised learning.
You can clone and edit this example there (tab Examples).
This example uses RidgeRegression to predict if the price is going up or down. Based on this predictions we can then define at which certainties we will go long/short or do nothing. This example was build for the Q18 NASDAQ-100 Stock Long-Short contest.
Strategy idea: We will go long or short on NASDAQ-100 stocks depending on predictions of RidgeRegression regarding if the price is moving up or down.
Features for learning - trend indicator, stochastic oscillator, volatility
To have a look at all the technical indicators we offer, go to Technical Indicators
We will use a specialized version of the Quantiacs backtester for this purpose, which dramatically speeds up the backtesting process when the models should be retrained on a regular basis.
Need help? Check the Documentation and find solutions/report problems in the Forum section.
More help with Jupyter? Check the official Jupyter page.
Once you are done, click on Submit to the contest and take part to our competitions.
Learn more about RidgeRegression and other ML models: scikit-learn
API reference:
data: check how to work with data;
backtesting: read how to run the simulation and check the results.
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) { return false; }
// disable widget scrolling
import logging
import pandas as pd
import xarray as xr
import numpy as np
import qnt.backtester as qnbt
import qnt.ta as qnta
def create_model():
"""This is a constructor for the ML model which can be easily modified using a
different model.
"""
from sklearn.linear_model import Ridge
model = Ridge(random_state=18)
return model
def get_features(data):
"""Builds the features used for learning:
* a trend indicator;
* the stochastic oscillator;
* volatility;
These features can be modified and new ones can be added easily.
"""
#print(data)
#trend
trend = qnta.roc(qnta.lwma(data.sel(field='close'), 70), 1)
# stochastic oscillator:
k, d = qnta.stochastic(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'), 14)
#volatility
volatility = qnta.tr(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'))
# combine the selected four features:
result = xr.concat(
[trend, d,volatility],
pd.Index(
['trend', 'stochastic_d','volatilty'],
name = 'field'
)
)
return result.transpose('time', 'field', 'asset')
def get_target_classes(data):
"""Builds target classes for predicting if price goes up or down. This will later be use to evaluate if
we long or short.
"""
price_current = data.sel(field='close')
price_future = qnta.shift(price_current, -1)
class_positive = 1 #prices goes up
class_negative = 0 #price goes down
target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
return target_is_price_up
def train_model(data):
"""Create and train the models working on an asset-by-asset basis."""
asset_name_all = ['NAS:AAPL', 'NAS:AMZN', 'NAS:MSFT']
features_all = get_features(data)
target_all = get_target_classes(data)
models = dict()
for asset_name in asset_name_all:
model = create_model()
# drop missing values:
try:
target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
except:
print(target_all["asset"])
features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')
# align features and targets:
target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')
if len(features_cur.time) < 10:
# not enough points for training
continue
try:
model.fit(feature_for_learn_df.values, target_for_learn_df)
models[asset_name] = model
except KeyboardInterrupt as e:
raise e
except:
logging.exception('model training failed')
return models
def predict(models, data):
"""The model predicts if the price is going up or down and we then use this information to determine
if we want to go long, short or do nothing.
Prediction is performed for several days in order to speed up the evaluation.
"""
asset_name_all = ['NAS:AAPL', 'NAS:AMZN' , 'NAS:MSFT']
weights = xr.zeros_like(data.sel(field='close'))
for asset_name in asset_name_all:
features_all = get_features(data)
features_cur = features_all.sel(asset=asset_name).dropna('time','any')
if len(features_cur.time) < 1:
continue
try:
prediction = models[asset_name].predict(features_cur.values) #prediction for each day (in [0,1])
for i in range(len(prediction)):
p = prediction[i]
if p > 0.5: #model predicts price is going up
prediction[i] = 1 #long
elif p < 0.4: #model is fairly certain price is going down
prediction[i] = -1 #short
else: #model is not so sure about price going down
prediction[i] = 0 #do nothing
weights.loc[dict(asset=asset_name,time=features_cur.time.values)] = prediction
except KeyboardInterrupt as e:
raise e
except:
logging.exception('model prediction failed')
return weights
weights = qnbt.backtest_ml(
train=train_model,
predict=predict,
train_period=10*365, # the data length for training in calendar days
retrain_interval=365, # how often we have to retrain models (calendar days)
retrain_interval_after_submit=1, # how often retrain models after submission during evaluation (calendar days)
predict_each_day=False, # Is it necessary to call prediction for every day during backtesting?
# Set it to true if you suspect that get_features is looking forward.
competition_type='stocks_nasdaq100', # competition type
lookback_period=365, # how many calendar days are needed by the predict function to generate the output
start_date='2006-01-01', # backtest start date
build_plots=True # do you need the chart?
)
What libraries are available?
Our library makes extensive use of xarray:
pandas:
and numpy:
Function definitions can be found in the qnt folder in your private root directory.
# Import basic libraries.
import xarray as xr
import pandas as pd
import numpy as np
# Import quantnet libraries.
import qnt.data as qndata # load and manipulate data
import qnt.output as output # manage output
import qnt.backtester as qnbt # backtester
import qnt.stats as qnstats # statistical functions for analysis
import qnt.graph as qngraph # graphical tools
import qnt.ta as qnta # indicators library
May I import libraries?
Yes, please refer to the file init.ipynb in your home directory. You can dor example use:
! conda install -y scikit-learn
How to load data?
Futures:
data= qndata.futures.load_data(tail = 15*365, dims = ("time", "field", "asset"))
BTC Futures:
data= qndata.cryptofutures.load_data(tail = 15*365, dims = ("time", "field", "asset"))
Cryptocurrencies:
data= qndata.crypto.load_data(tail = 15*365, dims = ("time", "field", "asset"))
How to view a list of all tickers?
data.asset.to_pandas().to_list()
How to see which fields are available?
data.field.to_pandas().to_list()
How to load specific tickers?
data = qndata.futures.load_data(tail=15 * 365, assets=['F_O', 'F_DX', 'F_GC'])
How to select specific tickers after loading all data?
def get_data_filter(data, assets):
filler= data.sel(asset=assets)
return filler
get_data_filter(data, ["F_O", "F_DX", "F_GC"])
How to get the prices for the previous day?
qnta.shift(data.sel(field="open"), periods=1)
or:
data.sel(field="open").shift(time=1)
How do I get a list of the top 10 assets ranked by Sharpe ratio?
import qnt.stats as qnstats
data= qndata.futures.load_data(tail=16 * 365)
def get_best_instruments(data, weights, top_size):
# compute statistics:
stats_per_asset= qnstats.calc_stat(data, weights, per_asset=True)
# calculate ranks of assets by "sharpe_ratio":
ranks= (-stats_per_asset.sel(field="sharpe_ratio")).rank("asset")
# select top assets by rank "top_period" days ago:
top_period= 300
rank= ranks.isel(time=-top_period)
top= rank.where(rank <= top_size).dropna("asset").asset
# select top stats:
top_stats= stats_per_asset.sel(asset=top.values)
# print results:
print("SR tail of the top assets:")
display(top_stats.sel(field="sharpe_ratio").to_pandas().tail())
print("avg SR = ", top_stats[-top_period:].sel(field="sharpe_ratio").mean("asset")[-1].item())
display(top_stats)
return top_stats.coords["asset"].values
get_best_instruments(data, weights, 10)
How can I check the results for only the top 10 assets ranked by Sharpe ratio?
Select the top assets and then load their data:
best_assets= get_best_instruments(data, weights, 10)
data= qndata.futures.load_data(tail=15 * 365, assets=best_assets)
...
How can prices be processed?
Simply import standard libraries, for example numpy:
import numpy as np
high= np.log(data.sel(field="high"))
How can you reduce slippage impace when trading?
Just apply some technique to reduce turnover:
def get_lower_slippage(weights, rolling_time=6):
return weights.rolling({"time": rolling_time}).max()
improved_weights = get_lower_slippage(weights, rolling_time=6)
How to use technical analysis indicators?
For available indicators see the source code of the library: /qnt/ta
ATR
def get_atr(data, days=14):
high = data.sel(field="high") * 1.0
low = data.sel(field="low") * 1.0
close= data.sel(field="close") * 1.0
return qnta.atr(high, low, close, days)
atr= get_atr(data, days=14)
EMA
prices= data.sel(field="high")
prices_ema= qnta.ema(prices, 15)
TRIX
prices= data.sel(field="high")
prices_trix= qnta.trix(prices, 15)
ADL and EMA
adl= qnta.ad_line(data.sel(field="close")) * 1.0
adl_ema= qnta.ema(adl, 18)
How can you check the quality of your strategy?
import qnt.output as qnout
qnout.check(weights, data)
or
stat= qnstats.calc_stat(data, weights)
display(stat.to_pandas().tail())
or
import qnt.graph as qngraph
statistics= qnstats.calc_stat(data, weights)
display(statistics.to_pandas().tail())
performance= statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")
display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())
qnstats.print_correlation(weights, data)
An example using pandas
One can work with pandas DataFrames at intermediate steps and at the end convert them to xarray data structures:
def get_price_pct_change(prices):
prices_pandas= prices.to_pandas()
assets= data.coords["asset"].values
for asset in assets:
prices_pandas[asset]= prices_pandas[asset].pct_change()
return prices_pandas
prices= data.sel(field="close") * 1.0
prices_pct_change= get_price_pct_change(prices).unstack().to_xarray()
How to submit a strategy to the competition?
Check that weights are fine:
import qnt.output as qnout
qnout.check(weights, data)
If everything is ok, write the weights to file:
qnout.write(weights)
In your personal account:
- choose a strategy;
- click on the Submit button;
- select the type of competition.
At the beginning you will find the strategy under the Checking area (Competition > Checking). If Sharpe ratio is larger than 1 and technical checks are successful, the strategy will go under the Running area (Competition > Running). Otherwise it will be Filtered (Competition > Filtered) and you should inspect error and warning messages.