Question about the Q17 Machine Learning Example Algo

cespadilla

Hi guys,
I was just checking out the Q17 Machine Learning Algo (With Retraining). I don't know if it's just me, but I find the following strange:

The initial algorithm (has look ahead bias and what not) uses 54 different instruments during the backtest. As far as I can see, there is no "is liquid" filter anywhere, since this is just for educational purposes.
When the ML algorithm is passed through the backtester, it only trades 8 instruments in the same timeframe. What gives? Is there some parameter that is tuned when using the backtester instead of the whole data? Is this an error? I'd love to keep testing and exploring ML algorithms, but I think that the total number of traded instruments over 8 years should be more than 8, right?

Please let me know what changes I can make to the code, change the data, competition type, etc. in the backtester parameters, or if this is by design.

Full data "test":
54 instruments

Backtester:
8 instruments

support

@cespadilla Hi, sorry for late answer, we are checking and will let you know soon.

Vyacheslav_B

@cespadilla Hello.

The reason is in "train_model" function.

def train_model(data):
    asset_name_all = data.coords['asset'].values
    features_all = get_features(data)
    target_all = get_target_classes(data)


    models = dict()

    for asset_name in asset_name_all:

        # drop missing values:
        target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')
        
        
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')
        if len(features_cur.time) < 10:
                continue
        model = get_model()
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

                
        except:
            logging.exception('model training failed')

    return models

If there are less than 10 features for training the model, then the model is not created (if len(features_cur.time) < 10).

This condition makes sense. I would not remove it.

The second thing that can affect is the retraining interval of the model ("retrain_interval").


weights = qnbt.backtest_ml(
    train=train_model,
    predict=predict_weights,
    train_period=2 *365,  # the data length for training in calendar days
    retrain_interval=10 *365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit=1,  # how often retrain models after submission during evaluation (calendar days)
    predict_each_day=False,  # Is it necessary to call prediction for every day during backtesting?
    # Set it to true if you suspect that get_features is looking forward.
    competition_type='crypto_daily_long_short',  # competition type
    lookback_period=365,  # how many calendar days are needed by the predict function to generate the output
    start_date='2014-01-01',  # backtest start date
    analyze = True,
    build_plots=True  # do you need the chart?
)

Sjackson3289

This post is deleted!