strategy

Machine Learning with a Voting Classifier

This template uses voting for combining classifiers and it shows how to use the backtester with retraining option.

You can clone and edit this example there (tab Examples).


With Quantiacs you can use machine learning methods for forecasting financial time series.

In this template we show how to use the Quantiacs toolkit for efficiently retraining your model on a rolling basis.

We will work with the BTC Futures Contracts and use voting on top of a combination of Ridge Classifiers and Stochastic Gradient Descent Classifiers as implemented in scikit-learn.

We will use a specialized version of the Quantiacs backtester for this purpose, which dramatically speeds up the backtesting process when the models should be retrained on a regular basis.

Need help? Check the Documentation and find solutions/report problems in the Forum section.

More help with Jupyter? Check the official Jupyter page.

Once you are done, click on Submit to the contest and take part to our competitions.

API reference:

  • data: check how to work with data;

  • backtesting: read how to run the simulation and check the results.

In [1]:
%%javascript
window.IPython && (IPython.OutputArea.prototype._should_scroll = function(lines) { return false; })
// disable widget scrolling
In [2]:
import logging

import pandas as pd
import xarray as xr
import numpy as np

import qnt.backtester as qnbt
import qnt.ta as qnta
In [3]:
def create_model():
    """This is a constructor for the ML model which can be easily modified using
       different models or another logic for the combination.
    """
    
    from sklearn.ensemble import VotingClassifier
    from sklearn.linear_model import SGDClassifier, RidgeClassifier
    import random
    
    # We will use a model obtained combining by voting Ridge Classifiers and SGD Classifiers
    # which use several random seeds to reduce overfitting:
    classifiers = []
    r = random.Random(13)
    for i in range(42):
        classifiers.append(('ridge' + str(i), RidgeClassifier(random_state=r.randint(0, pow(2, 32) - 1)),))
        classifiers.append(('sgd' + str(i), SGDClassifier(random_state=r.randint(0, pow(2, 32) - 1)),))
    model = VotingClassifier(classifiers)

    return model
In [4]:
def get_features(data):
    """Builds the features used for learning:
       * a trend indicator;
       * the stochastic oscillator;
       * volatility;
       * volume.
    """
    
    trend = qnta.roc(qnta.lwma(data.sel(field='close'), 70), 1)

    # stochastic oscillator:
    k, d = qnta.stochastic(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'), 14)

    volatility = qnta.tr(data.sel(field='high'), data.sel(field='low'), data.sel(field='close'))
    volatility = volatility / data.sel(field='close')
    volatility = qnta.lwma(volatility, 14)

    volume = data.sel(field='vol')
    volume = qnta.sma(volume, 5) / qnta.sma(volume, 60)
    volume = volume.where(np.isfinite(volume), 0)

    # combine the selected four features:
    result = xr.concat(
        [trend, d, volatility, volume],
        pd.Index(
            ['trend', 'stochastic_d', 'volatility', 'volume'],
            name = 'field'
        )
    )
    
    return result.transpose('time', 'field', 'asset')
In [5]:
def get_target_classes(data):
    """Builds target classes which will be later predicted."""

    price_current = data.sel(field='close')
    price_future = qnta.shift(price_current, -1)

    class_positive = 1
    class_negative = 0

    target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
    
    return target_is_price_up
In [6]:
def create_and_train_models(data):
    """Create and train the models working on an asset-by-asset basis."""
    
    asset_name_all = data.coords['asset'].values

    data = data.sel(time=slice('2013-05-01',None)) # cut the noisy data head before 2013-05-01

    features_all = get_features(data)
    target_all = get_target_classes(data)

    models = dict()

    for asset_name in asset_name_all:
        
        # drop missing values:
        target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')

        # align features and targets:
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')

        if len(features_cur.time) < 10:
            # not enough points for training
            continue

        model = create_model()
        
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model
        except KeyboardInterrupt as e:
            raise e
        except:
            logging.exception('model training failed')

    return models
In [7]:
def predict(models, data):
    """Performs prediction and generates output weights.
       Generation is performed for several days in order to speed 
       up the evaluation.
    """
    
    asset_name_all = data.coords['asset'].values
    weights = xr.zeros_like(data.sel(field='close'))
    
    for asset_name in asset_name_all:
        if asset_name in models:
            model = models[asset_name]
            features_all = get_features(data)
            features_cur = features_all.sel(asset=asset_name).dropna('time','any')
            if len(features_cur.time) < 1:
                continue
            try:
                weights.loc[dict(asset=asset_name,time=features_cur.time.values)] = model.predict(features_cur.values)
            except KeyboardInterrupt as e:
                raise e
            except:
                logging.exception('model prediction failed')
                
    return weights

The following cell runs the backtester into Machine Learning retraining mode. We specify the maximal length of the training period and the interval for retraining. Note that it is possible tor retrain the model every day after submissions to the Quantiacs servers.

In [8]:
weights = qnbt.backtest_ml(
    train=create_and_train_models,
    predict=predict,
    train_period=10*365,   # the data length for training in calendar days
    retrain_interval=365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit=1, # how often retrain models after submission during evaluation (calendar days)
    predict_each_day=False,  # Is it necessary to call prediction for every day during backtesting?
                             # Set it to true if you suspect that get_features is looking forward.
    competition_type='cryptofutures',  # competition type
    lookback_period=365,      # how many calendar days are needed by the predict function to generate the output
    start_date='2014-01-01',  # backtest start date
    build_plots=True          # do you need the chart?
)
Run the last iteration...
100% (219152 of 219152) |################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (4172 of 4172) |####################| Elapsed Time: 0:00:00 Time:  0:00:00
Output cleaning...
fix uniq
ffill if the current price is None...
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Run First Iteration...
100% (17072 of 17072) |##################| Elapsed Time: 0:00:00 Time:  0:00:00
---
Run all iterations...
Load data...
100% (240632 of 240632) |################| Elapsed Time: 0:00:00 Time:  0:00:00
100% (224132 of 224132) |################| Elapsed Time: 0:00:00 Time:  0:00:00
Backtest...
100% (227732 of 227732) |################| Elapsed Time: 0:00:00 Time:  0:00:00
Output cleaning...
fix uniq
ffill if the current price is None...
Check missed dates...
Ok.
Normalization...
Output cleaning is complete.
Write output: /root/fractions.nc.gz
State saved.
---
Analyze results...
Check...
Check missed dates...
Ok.
Check the sharpe ratio...
Period: 2014-01-01 - 2024-03-22
Sharpe Ratio = 0.7782263541718313
ERROR! The Sharpe Ratio is too low. 0.7782263541718313 < 1
Improve the strategy and make sure that the in-sample Sharpe Ratio more than 1.
Check correlation.
WARNING! Can't calculate correlation.
Correlation check failed.
---
Align...
Calc global stats...
---
Calc stats per asset...
Build plots...
---
Output:
asset BTC
time
2024-03-13 1.0
2024-03-14 1.0
2024-03-15 1.0
2024-03-16 1.0
2024-03-17 1.0
2024-03-18 1.0
2024-03-19 1.0
2024-03-20 1.0
2024-03-21 1.0
2024-03-22 1.0
Stats:
field equity relative_return volatility underwater max_drawdown sharpe_ratio mean_return bias instruments avg_turnover avg_holding_time
time
2024-03-13 74.900198 0.036746 0.648134 0.000000 -0.855027 0.814085 0.527636 1.0 1.0 0.070529 42.080000
2024-03-14 72.283409 -0.034937 0.648149 -0.034937 -0.855027 0.805587 0.522140 1.0 1.0 0.070530 42.080000
2024-03-15 69.827480 -0.033976 0.648158 -0.067726 -0.855027 0.797360 0.516815 1.0 1.0 0.070513 42.080000
2024-03-16 68.523184 -0.018679 0.648103 -0.085140 -0.855027 0.792842 0.513843 1.0 1.0 0.070499 42.080000
2024-03-17 69.981206 0.021278 0.648045 -0.065674 -0.855027 0.797480 0.516803 1.0 1.0 0.070494 42.080000
2024-03-18 69.134921 -0.012093 0.647972 -0.076973 -0.855027 0.794518 0.514825 1.0 1.0 0.070486 42.080000
2024-03-19 65.434941 -0.053518 0.648116 -0.126372 -0.855027 0.781516 0.506513 1.0 1.0 0.070480 42.080000
2024-03-20 68.668534 0.049417 0.648201 -0.083200 -0.855027 0.792167 0.513483 1.0 1.0 0.070487 42.080000
2024-03-21 67.060444 -0.023418 0.648161 -0.104669 -0.855027 0.786542 0.509806 1.0 1.0 0.070490 42.080000
2024-03-22 64.733322 -0.034702 0.648175 -0.135739 -0.855027 0.778226 0.504427 1.0 1.0 0.070473 42.315789
---
100% (3727 of 3727) |####################| Elapsed Time: 0:01:08 Time:  0:01:08