single pass and multipass discrepancy

darwinps

Hi
I have been trying to write a simple example with both single pass and multi pass just to understand them better. But I can't get them to produce the same result. Here is what I use:

single pass version:

import xarray as xr
import qnt.ta as qnta
import qnt.data as qndata

data = qndata.futures_load_data(min_date='2006-01-01', max_date='2007-01-01', assets= ['F_ES'])

close = data.sel(field='close')
close_one_day_ago = qnta.shift(close, periods=1)

_open = data.sel(field='open')
open_one_day_ago = qnta.shift(_open, periods=1)

weights = xr.where(open_one_day_ago < close_one_day_ago, 1, 0)

The result looks like the following.

multi pass version:

import xarray as xr
import qnt.ta as qnta
import qnt.backtester as qnbt
import qnt.data as qndata

def load_data(period):
    return qndata.futures_load_data(tail=period, assets=['F_ES'])

def strategy(data):
    close = data.sel(field='close')
    close_one_day_ago = qnta.shift(close, periods=1)
    _open = data.sel(field='open')
    open_one_day_ago = qnta.shift(_open, periods=1)
    weights = xr.where(open_one_day_ago < close_one_day_ago, 1, 0)
    return weights

weights = qnbt.backtest(
    competition_type= 'futures',
    load_data= load_data,
    lookback_period= 365,
    start_date= '2006-01-01',
    end_date= '2007-01-01',
    strategy= strategy
)

The result look like the following.

Any help/hint/suggestion is greatly appreciated.

support

Dear @darwinps,

This discrepancy is normal due to different ways how the stats are calculated at the beginning of the in sample period. Over a longer period, these statistics should become very similar or identical.
Thanks for bringing this up, and if you notice some big discrepancies in the future, please report that to us.

Best regards

stefanm

@darwinps Hi,
There are two key elements missing in the single pass approach that lead to the discrepancy:

Insufficient Data Loaded
Most strategies require an initialization period to set everything up, like calculation of technical indicators used in strategy. In this case, it’s one day, and to produce weights for a specific date, you need the previous day’s close and open prices. For example, to get accurate weights for '2006-01-03,' you’ll need the prices from the previous trading day, '2005-12-30.'

For calculating statistics, a key factor is Relative Returns, which cannot be computed without considering slippage. Slippage is approximated as 14-day Average True Range (ATR(14)) multiplied by 0.04 (4% of ATR(14) for futures) or 0.05 (for stocks). Therefore, to get accurate stats from '2006-01-01,' you need data loaded at least 14 prior trading days. This is handled in backtester with lookback_period (365 days in this case), which means that data will be loaded from 365 days before start_date.
Manually Filtering Data to a Single Asset
When using the backtester, the strategy’s output is aligned with all assets in the dataset. Although this doesn’t affect the weights—assets not included in the strategy will simply have their weights set to zero—there may be days in the full dataset that don’t appear in the filtered data. To ensure proper alignment in single pass mode, the clean() and align() functions from qnt.output should be run manually.

The code below should produce identical results:

# 365 days lookback period + 60 additional days as minimum tail for loading data for allignment hardcoded in the backtester, therefore min_date='2004-11-05'
data = qndata.futures_load_data(min_date='2004-11-05', max_date='2007-01-01') 
f_es_data = qndata.futures_load_data(min_date='2005-12-01', max_date='2007-01-01', assets= ['F_ES']) # enough for stats and weights, one month before
### OR ###
# f_es_data = data.sel(asset=['F_ES'])

weights = strategy(f_es_data)
weights = qnout.clean(weights, data, 'futures') # in the backtester, clean() uses the same data as strategy() (f_es_data)
weights = qnout.align(weights, data, start='2006-01-01') # assign weights from 2006-01-01

stats = qnst.calc_stat(data, weights)
display(stats.sel(time=slice("2006-01-01", None)).to_pandas().head(15)) # show stats from 2006-01-01

In general, it's acceptable to generate some signals based on specific assets, but manually selecting assets for weight allocation is not allowed. Weight allocation should be dynamic across the entire dataset. Therefore, it’s recommended to load the entire dataset for the corresponding competition, which the strategy function will use as "data" parameter.

The following example uses the same strategy foundation and outputs as the initial one but applied to the full dataset. This resulted in identical statistics for both single and multi-pass approaches (though it’s still not compliant with the rules due to the hand-picked asset).

def load_data(period):
    return qndata.futures_load_data(tail=period)

def strategy(data):
    _data = data.sel(asset=['F_ES'])
    close = _data.sel(field='close')
    close_one_day_ago = qnta.shift(close, periods=1)
    _open = _data.sel(field='open')
    open_one_day_ago = qnta.shift(_open, periods=1)
    weights = xr.where(open_one_day_ago < close_one_day_ago, 1, 0)
    return weights

weights_multi = qnbt.backtest(
    competition_type= 'futures',
    load_data= load_data,
    lookback_period= 365,
    start_date= '2006-01-01',
    end_date= '2007-01-01',
    strategy= strategy,
)


### For single pass
data = qndata.futures_load_data(min_date='2004-11-05', max_date='2007-01-01')
weights_single = strategy(data).sel(time=slice('2006-01-01', None))
weights_single = qnout.clean(weights_single, data, 'futures')

stats = qnst.calc_stat(data, weights_single)
display(stats.sel(time=slice("2006-01-01", None)).to_pandas().head(15))

The greatest advantage of the single pass is its execution speed, which is especially important during the optimization process. However, it requires more attention to ensure that all aspects are handled properly. For instance, it’s quite easy to incorporate forward-looking information in a single pass, which is precisely what the multi-pass approach aims to prevent.

Try to use the opposite shift direction in the strategy for main variables which produce signals:

close_one_day_ago = qnta.shift(close, periods=-1)

open_one_day_ago = qnta.shift(_open, periods=-1)

Run it as single and multi pass and check the results.

darwinps

Dear @support

Thank you for your confirmation

regards

darwinps

Hi @stefanm

Thank you for your detailed explanation

To be honest, I did think about the initialization period. But I added the extra (earlier) period for all data, not only for 'F_ES' as in the example (which should not matter?).

By the way, I used 'illegal' hand-picked single asset (as well as the trivial strategy) just to simplify the case.

Thanks for pointing out the opposite shift direction. It's a habit from different language (mql5).

I copied pasted the codes you modified; I still don't get the same results.

I only changed:

weights = strategy(f_es_data)

to

close = data.sel(field='close')
close_one_day_ago = qnta.shift(close, periods=1)

_open = data.sel(field='open')
open_one_day_ago = qnta.shift(_open, periods=1)

weights = xr.where(open_one_day_ago < close_one_day_ago, 1, 0)

kind regards

stefanm

@darwinps Hi,

Your change should produce completely the same weights, in case you used the same input (data). If f_es_data that you passed as parameter to strategy() function is different than data you use further e.g close = data.sel(field='close'), you will get different output.

If you still get discrepancy, please share the entire code you used.
Best regards,

darwinps

Hi @stefanm ,

How reckless of me, "data" should have been f_es_data. They are perfectly synced now.
Thank you so much. I really appreciate the help.

sincerely