How to do sliding window of test/train

Penrose-Moore

Hi,
I am starting from the simple ema futures example, and I am trying to get comfortable with the api. I want to move my training window forward incrementally, and also move my test period forward.

I started by trying to set the min_date and max_date in the load_date function, and setting start_date in the backtest function. This leaves a slightly confusing test_period, which I would prefer to calculate myself by simply specifiying the train and test dates, but I notice that in the backtest code pd.Timestamp.today() is used, which means the backtester is bound to somehow use today's calendar date to anchor the test interval.

Can someone help me paramterize these functions in such a way that I can perform iterative, windowed backtesting? I want to start far in the past to do some pre-training without burning too much data.

Penrose-Moore

And to be more clear, sorry, I want the test period to be also in the distant past, not pinned to recent calendar time.

support

Hello.

@penrose-moore said in How to do sliding window of test/train:

started by trying to set the min_date and max_date in the load_date function, and setting start_date in the backtest function. This leaves a slightly confusing test_period, which I would prefer to calculate myself by simply specifiying the train and test dates, but I notice that in the backtest code pd.Timestamp.today() is used, which means the backtester is bound to somehow use today's calendar date to anchor the test interval.

I guess, you were confused by this code:

if start_date is None:
    start_date = pd.Timestamp.today().to_datetime64() - np.timedelta64(test_period-1, 'D')
else:
    start_date = pd.Timestamp(start_date).to_datetime64()
    test_period = (pd.Timestamp.today().to_datetime64() - start_date) / np.timedelta64(1, 'D')

Well, I try to clarify it.

As you see, when you specify the start_date, you don't need to specify the test_period.

The backtester is developed in order to optimize the execution time on our servers and on client side. For this purpose, 'load_function' receives the parameter period for data and makes these calculations.

On server side, the evaluator runs the strategy day-by-day. For every day it runs these steps:

backtester calls load_data with period=lookback_period
backtester runs the strategy
Backtester saves the output.
the evaluator gets the last day from the output

On the client side it works in a different way:

the backtester calls load_data with period=(today - start_date) + lookback_period
for every day between start_date and today) it does these steps:
2.1 it calls the window function cut the data for iteration
2.2 calls the strategy with this data fragment
2.3 saves the last day from the output.
when it finishes all iterations, it joins outputs from 2.3 and calculates statistics.

This multi-pass backtester with data isolation is for detecting and preventing looking forward issues.

support

And, yes, this backtester is for testing, not for training.

I propose you to pretrain and save your model to the file and load it for backtesting.

Model training is a slow process, your model can go over the time limit during evaluation.

This is an example of how to do it:

train.ipynb:


... train your model here ...
... or pretrain model on your PC...

import pickle, gzip
pickle.dump(model, gzip.open('model.pickle.gz', 'w'))

strategy.ipynb:

import xarray as xr

import qnt.ta as qnta
import qnt.backtester as qnbt
import qnt.data as qndata

import pickle, gzip

model = pickle.load(gzip.open('model.pickle.gz', 'r'))


def load_data(period):
    return qndata.cryptofutures.load_data(tail=period, dims=("time","field","asset"))

def strategy(data):
    prediction = model.predict(data)
    ...

weights = qnbt.backtest(
    competition_type= "cryptofutures",
    load_data= load_data,
    lookback_period= 365,
    start_date= "2014-01-01",
    strategy= strategy,
    analyze=True,
    build_plots=True
)

support

If this example is not relevant for you, could you give me your code example and tell me what do you expect?

I will modify it.

Notice: This forum is public and all posts are public here.

Regards.

Penrose-Moore

@support These clarifying details are helpful, and relevant. I started with that minimal example and the api has parameters that are confusing to me, and I only realized that the backtest function is itself performing a loop after going into the code.

I don't have any futures data for my own use anymore so I will just focus on porting some models to this framework locally using your data and see if anything good happens. I have worked at a CTA but on daily data the sharpe ratios out of sample over a ten or 15 year period are lower than 1.0 on average. I have been working more with equities, but it is all an uphill battle with one person and limited time and compute. Models that look good in the short term were never the ones that were best long-term, and this is a bit perverse because a greedy strategy of model selection favors the short-term winners. I am not sure I can come up with futures models that place in the competition and are good to trade long-term. I might be able to come up with some models that are good in an ensemble setting, maybe your scoring knows how to judge models on their marginal contribution in a leave-one-out kind of way? Or maybe you could run gradient boosting on all the user models?

Thanks, I might have more questions soon.

support

@penrose-moore Hello, you are correct, competition winners will be systems which are doing good in the timespan of some months, while there can be very good systems with a moderate Sharpe ratio for 4 months, but good for a long timespan. Winning the contest is one way of getting allocations, but we are interested in the long-term performance of all submitted systems. Currently systems which did not win contests but are doing good in the long term after submission are being traded and quants who developed them are getting a fee.

support

@penrose-moore We finally released a template which allows you to perform a retraining, it is available in the "Examples" section of your private space or publicly in the Documentation:

https://quantiacs.com/documentation/en/examples/machine_learning_with_a_voting_classifier.html