How to do sliding window of test/train
-
Hi,
I am starting from the simple ema futures example, and I am trying to get comfortable with the api. I want to move my training window forward incrementally, and also move my test period forward.I started by trying to set the min_date and max_date in the load_date function, and setting start_date in the backtest function. This leaves a slightly confusing test_period, which I would prefer to calculate myself by simply specifiying the train and test dates, but I notice that in the backtest code pd.Timestamp.today() is used, which means the backtester is bound to somehow use today's calendar date to anchor the test interval.
Can someone help me paramterize these functions in such a way that I can perform iterative, windowed backtesting? I want to start far in the past to do some pre-training without burning too much data.
-
And to be more clear, sorry, I want the test period to be also in the distant past, not pinned to recent calendar time.
-
Hello.
@penrose-moore said in How to do sliding window of test/train:
started by trying to set the min_date and max_date in the load_date function, and setting start_date in the backtest function. This leaves a slightly confusing test_period, which I would prefer to calculate myself by simply specifiying the train and test dates, but I notice that in the backtest code pd.Timestamp.today() is used, which means the backtester is bound to somehow use today's calendar date to anchor the test interval.
I guess, you were confused by this code:
if start_date is None: start_date = pd.Timestamp.today().to_datetime64() - np.timedelta64(test_period-1, 'D') else: start_date = pd.Timestamp(start_date).to_datetime64() test_period = (pd.Timestamp.today().to_datetime64() - start_date) / np.timedelta64(1, 'D')
Well, I try to clarify it.
As you see, when you specify the
start_date
, you don't need to specify thetest_period
.The backtester is developed in order to optimize the execution time on our servers and on client side. For this purpose, 'load_function' receives the parameter period for data and makes these calculations.
On server side, the evaluator runs the strategy day-by-day. For every day it runs these steps:
- backtester calls
load_data
withperiod=lookback_period
- backtester runs the
strategy
- Backtester saves the output.
- the evaluator gets the last day from the output
On the client side it works in a different way:
- the backtester calls
load_data
withperiod=(today - start_date) + lookback_period
- for every day between start_date and today) it does these steps:
2.1 it calls thewindow
function cut the data for iteration
2.2 calls thestrategy
with this data fragment
2.3 saves the last day from the output. - when it finishes all iterations, it joins outputs from 2.3 and calculates statistics.
This multi-pass backtester with data isolation is for detecting and preventing looking forward issues.
- backtester calls
-
And, yes, this backtester is for testing, not for training.
I propose you to pretrain and save your model to the file and load it for backtesting.
Model training is a slow process, your model can go over the time limit during evaluation.
This is an example of how to do it:
train.ipynb:
... train your model here ... ... or pretrain model on your PC... import pickle, gzip pickle.dump(model, gzip.open('model.pickle.gz', 'w'))
strategy.ipynb:
import xarray as xr import qnt.ta as qnta import qnt.backtester as qnbt import qnt.data as qndata import pickle, gzip model = pickle.load(gzip.open('model.pickle.gz', 'r')) def load_data(period): return qndata.cryptofutures.load_data(tail=period, dims=("time","field","asset")) def strategy(data): prediction = model.predict(data) ... weights = qnbt.backtest( competition_type= "cryptofutures", load_data= load_data, lookback_period= 365, start_date= "2014-01-01", strategy= strategy, analyze=True, build_plots=True )
-
If this example is not relevant for you, could you give me your code example and tell me what do you expect?
I will modify it.
Notice: This forum is public and all posts are public here.
Regards.
-
@support These clarifying details are helpful, and relevant. I started with that minimal example and the api has parameters that are confusing to me, and I only realized that the backtest function is itself performing a loop after going into the code.
I don't have any futures data for my own use anymore so I will just focus on porting some models to this framework locally using your data and see if anything good happens. I have worked at a CTA but on daily data the sharpe ratios out of sample over a ten or 15 year period are lower than 1.0 on average. I have been working more with equities, but it is all an uphill battle with one person and limited time and compute. Models that look good in the short term were never the ones that were best long-term, and this is a bit perverse because a greedy strategy of model selection favors the short-term winners. I am not sure I can come up with futures models that place in the competition and are good to trade long-term. I might be able to come up with some models that are good in an ensemble setting, maybe your scoring knows how to judge models on their marginal contribution in a leave-one-out kind of way? Or maybe you could run gradient boosting on all the user models?
Thanks, I might have more questions soon.
-
@penrose-moore Hello, you are correct, competition winners will be systems which are doing good in the timespan of some months, while there can be very good systems with a moderate Sharpe ratio for 4 months, but good for a long timespan. Winning the contest is one way of getting allocations, but we are interested in the long-term performance of all submitted systems. Currently systems which did not win contests but are doing good in the long term after submission are being traded and quants who developed them are getting a fee.
-
@penrose-moore We finally released a template which allows you to perform a retraining, it is available in the "Examples" section of your private space or publicly in the Documentation:
https://quantiacs.com/documentation/en/examples/machine_learning_with_a_voting_classifier.html