strategy

Q18 Supervised Learning

This template shows how to make a submission to the Q17 cryptocurrency contest using supervised learning.

You can clone and edit this example there (tab Examples).


This example uses RidgeRegression to predict if the price is going up or down. Based on this predictions we can then define at which certainties we will go long/short or do nothing. This example was build for the Q18 NASDAQ-100 Stock Long-Short contest.

Strategy idea: We will go long or short on NASDAQ-100 stocks depending on predictions of RidgeRegression regarding if the price is moving up or down.

Features for learning - trend indicator, stochastic oscillator, volatility

To have a look at all the technical indicators we offer, go to Technical Indicators


We will use a specialized version of the Quantiacs backtester for this purpose, which dramatically speeds up the backtesting process when the models should be retrained on a regular basis.

Need help? Check the Documentation and find solutions/report problems in the Forum section.

More help with Jupyter? Check the official Jupyter page.

Once you are done, click on Submit to the contest and take part to our competitions.

Learn more about RidgeRegression and other ML models: scikit-learn

API reference:

  • data: check how to work with data;

  • backtesting: read how to run the simulation and check the results.

In [1]:
%%javascript
IPython.OutputArea.prototype._should_scroll = function(lines) { return false; }
// disable widget scrolling
In [2]:
import logging

import pandas as pd
import xarray as xr
import numpy as np

import qnt.backtester as qnbt
import qnt.ta as qnta
In [3]:
def create_model():
    """This is a constructor for the ML model which can be easily modified using a
       different model. 
    """
    from sklearn.linear_model import Ridge
    model = Ridge(random_state=18)
    
    return model
In [4]:
def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the stochastic oscillator;
       * volatility;
       
       These features can be modified and new ones can be added easily.
    """
    #print(data)
    #trend
    trend = qnta.roc(qnta.lwma(data.sel(field='close'), 70), 1)

    # stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'), 14)
    
    #volatility
    volatility = qnta.tr(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'))

    # combine the selected four features:
    result = xr.concat(
        [trend, d,volatility],
        pd.Index(
            ['trend', 'stochastic_d','volatilty'],
            name = 'field'
        )
    )
    
    return result.transpose('time', 'field', 'asset')
In [5]:
def get_target_classes(data):
    """Builds target classes for predicting if price goes up or down. This will later be use to evaluate if 
        we long or short.
    """

    price_current = data.sel(field='close')
    price_future = qnta.shift(price_current, -1)

    class_positive = 1 #prices goes up
    class_negative = 0 #price goes down

    target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
    
    return target_is_price_up
In [6]:
def train_model(data):
    """Create and train the models working on an asset-by-asset basis."""
    
    asset_name_all = ['NAS:AAPL', 'NAS:AMZN', 'NAS:MSFT']
    
    features_all = get_features(data)
    target_all = get_target_classes(data)

    models = dict()
    
    for asset_name in asset_name_all:
        model = create_model()
        # drop missing values:
        try:
            target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        except: 
            print(target_all["asset"])
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')

        # align features and targets:
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')

        if len(features_cur.time) < 10:
            # not enough points for training
            continue

        
        
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception('model training failed')

    return models
In [7]:
def predict(models, data):
    """The model predicts if the price is going up or down and we then use this information to determine 
        if we want to go long, short or do nothing.
       Prediction is performed for several days in order to speed up the evaluation.
    """
    
    asset_name_all = ['NAS:AAPL', 'NAS:AMZN' , 'NAS:MSFT']
    weights = xr.zeros_like(data.sel(field='close'))
    
    for asset_name in asset_name_all:
        features_all = get_features(data)
        features_cur = features_all.sel(asset=asset_name).dropna('time','any')
        if len(features_cur.time) < 1:
            continue
        try:
            prediction = models[asset_name].predict(features_cur.values) #prediction for each day (in [0,1])
            
            for i in range(len(prediction)):
                p = prediction[i] 
                
                if p > 0.5: #model predicts price is going up
                    prediction[i] = 1 #long
                    
                elif p < 0.4: #model is fairly certain price is going down
                    prediction[i] = -1 #short
                    
                else: #model is not so sure about price going down
                    prediction[i] = 0 #do nothing
                
            weights.loc[dict(asset=asset_name,time=features_cur.time.values)] = prediction
            
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception('model prediction failed')
                
    return weights
In [8]:
weights = qnbt.backtest_ml(
    train=train_model,
    predict=predict,
    train_period=10*365,   # the data length for training in calendar days
    retrain_interval=365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit=1, # how often retrain models after submission during evaluation (calendar days)
    predict_each_day=False,  # Is it necessary to call prediction for every day during backtesting?
                             # Set it to true if you suspect that get_features is looking forward.
    competition_type='stocks_nasdaq100',  # competition type
    lookback_period=365,      # how many calendar days are needed by the predict function to generate the output
    start_date='2006-01-01',  # backtest start date
    build_plots=True          # do you need the chart?
)
Run the last iteration...
100% (367973 of 367973) |################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (39443 of 39443) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (14866184 of 14866184) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/4 1s
100% (14866112 of 14866112) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 2/4 3s
100% (14866112 of 14866112) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 3/4 4s
100% (554156 of 554156) |################| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 4/4 4s
Data loaded 4s
100% (756972 of 756972) |################| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/1 3s
Data loaded 3s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Run First Iteration...
100% (39443 of 39443) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (14913448 of 14913448) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/4 1s
100% (14919284 of 14919284) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 2/4 3s
100% (14895652 of 14895652) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 3/4 4s
100% (554596 of 554596) |################| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 4/4 4s
Data loaded 4s
---
Run all iterations...
Load data...
100% (39443 of 39443) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (14914612 of 14914612) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/9 1s
100% (14912592 of 14912592) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 2/9 2s
100% (14912592 of 14912592) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 3/9 3s
100% (14918636 of 14918636) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 4/9 3s
100% (14920656 of 14920656) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 5/9 4s
100% (14918580 of 14918580) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 6/9 5s
100% (14912520 of 14912520) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 7/9 6s
100% (14912520 of 14912520) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 8/9 7s
100% (13318028 of 13318028) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 9/9 8s
Data loaded 9s
100% (39443 of 39443) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (14617736 of 14617736) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/6 1s
100% (14620904 of 14620904) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 2/6 2s
100% (14617704 of 14617704) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 3/6 3s
100% (14617616 of 14617616) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 4/6 4s
100% (14617616 of 14617616) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 5/6 5s
100% (9640884 of 9640884) |##############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 6/6 6s
Data loaded 6s
Backtest...
100% (39443 of 39443) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (14741444 of 14741444) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 1/6 1s
100% (14744612 of 14744612) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 2/6 2s
100% (14741412 of 14741412) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 3/6 3s
100% (14741324 of 14741324) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 4/6 4s
100% (14741324 of 14741324) |############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 5/6 5s
100% (9722472 of 9722472) |##############| Elapsed Time: 0:00:00 Time:  0:00:00
fetched chunk 6/6 6s
Data loaded 6s
Output cleaning...
fix uniq
ffill if the current price is None...
Check liquidity...
Ok.
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Analyze results...
Check...
Check liquidity...
Ok.
Check missed dates...
Ok.
Check the sharpe ratio...
Period: 2006-01-01 - 2024-04-24
Sharpe Ratio = 0.32185602130355845
ERROR! The Sharpe Ratio is too low. 0.32185602130355845 < 1
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 1.
Check correlation.
WARNING! Can't calculate correlation.
Correlation check failed.
---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Output:
asset NAS:AAL NAS:AAPL NAS:ABNB NAS:ADBE NAS:ADI NAS:ADP NAS:ADSK NAS:AEP NAS:AKAM NAS:ALGN
time
2024-04-11 0.0 0.000000 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-12 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-15 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-16 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-17 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-18 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-19 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-22 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-23 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
2024-04-24 0.0 0.333333 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0
Stats:
field equity relative_return volatility underwater max_drawdown sharpe_ratio mean_return bias instruments avg_turnover avg_holding_time
time
2024-04-11 5.313187 0.023605 0.287633 0.000000 -0.762377 0.333112 0.095814 1.0 3.0 0.323270 6.078106
2024-04-12 5.238533 -0.014051 0.287622 -0.014051 -0.762377 0.330097 0.094943 1.0 3.0 0.323346 6.084592
2024-04-15 5.166289 -0.013791 0.287610 -0.027648 -0.762377 0.327142 0.094089 1.0 3.0 0.323425 6.084243
2024-04-16 5.134412 -0.006170 0.287583 -0.033647 -0.762377 0.325810 0.093697 1.0 3.0 0.323358 6.084243
2024-04-17 5.090139 -0.008623 0.287560 -0.041980 -0.762377 0.323960 0.093158 1.0 3.0 0.323290 6.084243
2024-04-18 5.029985 -0.011818 0.287543 -0.053302 -0.762377 0.321432 0.092426 1.0 3.0 0.323221 6.084243
2024-04-19 4.945340 -0.016828 0.287541 -0.069233 -0.762377 0.317837 0.091391 1.0 3.0 0.323151 6.084243
2024-04-22 4.985607 0.008142 0.287515 -0.061654 -0.762377 0.319477 0.091854 1.0 3.0 0.323085 6.084243
2024-04-23 5.045283 0.011970 0.287496 -0.050422 -0.762377 0.321897 0.092544 1.0 3.0 0.323016 6.084243
2024-04-24 5.045203 -0.000016 0.287465 -0.050437 -0.762377 0.321856 0.092522 1.0 3.0 0.322948 6.090466
---
100% (4609 of 4609) |####################| Elapsed Time: 0:00:30 Time:  0:00:30

What libraries are available?

Our library makes extensive use of xarray:

xarray

pandas:

pandas

and numpy:

numpy

Function definitions can be found in the qnt folder in your private root directory.

# Import basic libraries.
import xarray as xr
import pandas as pd
import numpy as np

# Import quantnet libraries.
import qnt.data    as qndata  # load and manipulate data
import qnt.output as output   # manage output
import qnt.backtester as qnbt # backtester
import qnt.stats   as qnstats # statistical functions for analysis
import qnt.graph   as qngraph # graphical tools
import qnt.ta      as qnta    # indicators library

May I import libraries?

Yes, please refer to the file init.ipynb in your home directory. You can dor example use:

! conda install -y scikit-learn

How to load data?

Futures:

data= qndata.futures.load_data(tail = 15*365, dims = ("time", "field", "asset"))

BTC Futures:

data= qndata.cryptofutures.load_data(tail = 15*365, dims = ("time", "field", "asset"))

Cryptocurrencies:

data= qndata.crypto.load_data(tail = 15*365, dims = ("time", "field", "asset"))

How to view a list of all tickers?

data.asset.to_pandas().to_list()

How to see which fields are available?

data.field.to_pandas().to_list()

How to load specific tickers?

data = qndata.futures.load_data(tail=15 * 365, assets=['F_O', 'F_DX', 'F_GC'])

How to select specific tickers after loading all data?

def get_data_filter(data, assets):
    filler= data.sel(asset=assets)
    return filler

get_data_filter(data, ["F_O", "F_DX", "F_GC"])

How to get the prices for the previous day?

qnta.shift(data.sel(field="open"), periods=1)

or:

data.sel(field="open").shift(time=1)

How do I get a list of the top 10 assets ranked by Sharpe ratio?

import qnt.stats as qnstats

data= qndata.futures.load_data(tail=16 * 365)

def get_best_instruments(data, weights, top_size):
    # compute statistics:
    stats_per_asset= qnstats.calc_stat(data, weights, per_asset=True)
    # calculate ranks of assets by "sharpe_ratio":
    ranks= (-stats_per_asset.sel(field="sharpe_ratio")).rank("asset")
    # select top assets by rank "top_period" days ago:
    top_period= 300
    rank= ranks.isel(time=-top_period)
    top= rank.where(rank <= top_size).dropna("asset").asset

    # select top stats:
    top_stats= stats_per_asset.sel(asset=top.values)

    # print results:
    print("SR tail of the top assets:")
    display(top_stats.sel(field="sharpe_ratio").to_pandas().tail())

    print("avg SR = ", top_stats[-top_period:].sel(field="sharpe_ratio").mean("asset")[-1].item())
    display(top_stats)
    return top_stats.coords["asset"].values

get_best_instruments(data, weights, 10)

How can I check the results for only the top 10 assets ranked by Sharpe ratio?

Select the top assets and then load their data:

best_assets= get_best_instruments(data, weights, 10)

data= qndata.futures.load_data(tail=15 * 365, assets=best_assets)
...

How can prices be processed?

Simply import standard libraries, for example numpy:

import numpy as np

high= np.log(data.sel(field="high"))

How can you reduce slippage impace when trading?

Just apply some technique to reduce turnover:

def get_lower_slippage(weights, rolling_time=6):
    return weights.rolling({"time": rolling_time}).max()

improved_weights = get_lower_slippage(weights, rolling_time=6)

How to use technical analysis indicators?

For available indicators see the source code of the library: /qnt/ta

ATR

def get_atr(data, days=14):
    high = data.sel(field="high") * 1.0 
    low  = data.sel(field="low") * 1.0 
    close= data.sel(field="close") * 1.0

    return qnta.atr(high, low, close, days)

atr= get_atr(data, days=14)

EMA

prices= data.sel(field="high")
prices_ema= qnta.ema(prices, 15)

TRIX

prices= data.sel(field="high")
prices_trix= qnta.trix(prices, 15)

ADL and EMA

adl= qnta.ad_line(data.sel(field="close")) * 1.0 
adl_ema= qnta.ema(adl, 18)

How can you check the quality of your strategy?

import qnt.output as qnout
qnout.check(weights, data)

or

stat= qnstats.calc_stat(data, weights)
display(stat.to_pandas().tail())

or

import qnt.graph   as qngraph
statistics= qnstats.calc_stat(data, weights)
display(statistics.to_pandas().tail())

performance= statistics.to_pandas()["equity"]
qngraph.make_plot_filled(performance.index, performance, name="PnL (Equity)", type="log")

display(statistics[-1:].sel(field = ["sharpe_ratio"]).transpose().to_pandas())
qnstats.print_correlation(weights, data)

An example using pandas

One can work with pandas DataFrames at intermediate steps and at the end convert them to xarray data structures:

def get_price_pct_change(prices):
    prices_pandas= prices.to_pandas()
    assets= data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset]= prices_pandas[asset].pct_change()
    return prices_pandas


prices= data.sel(field="close") * 1.0
prices_pct_change= get_price_pct_change(prices).unstack().to_xarray()

How to submit a strategy to the competition?

Check that weights are fine:

import qnt.output as qnout
qnout.check(weights, data)

If everything is ok, write the weights to file:

qnout.write(weights)

In your personal account:

  • choose a strategy;
  • click on the Submit button;
  • select the type of competition.

At the beginning you will find the strategy under the Checking area (Competition > Checking). If Sharpe ratio is larger than 1 and technical checks are successful, the strategy will go under the Running area (Competition > Running). Otherwise it will be Filtered (Competition > Filtered) and you should inspect error and warning messages.