Q17 Machine learning - RidgeRegression (Long/Short); there is an error in the code

EDDIEE

The loop "for asset_name in asset_name_all:" creates a model for each asset, but the individual models are never saved. At the end, the model for the last asset is returned and all the predictions are created based on this last model (asset 'XRP').

def train_model(data):
"""Create and train the models working on an asset-by-asset basis."""

asset_name_all = data.coords['asset'].values

data = data.sel(time=slice('2013-05-01',None)) # cut the noisy data head before 2013-05-01

features_all = get_features(data)
target_all = get_target_classes(data)

model = create_model()

for asset_name in asset_name_all:
    
    # drop missing values:
    target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
    features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')

    # align features and targets:
    target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')

    if len(features_cur.time) < 10:
        # not enough points for training
        continue

    
    
    try:
        model.fit(feature_for_learn_df.values, target_for_learn_df)
    except KeyboardInterrupt as e:
        raise e
    except:
        logging.exception('model training failed')

return model

support

@eddiee Thanks a lot. If you have some good fix, could you upload it to our repo in github? Or send a snippet here, we will happily update it. Sorry for the issue.

EDDIEE

@support

This is a possible fix, but no gurantee. You have to adjust also the prediction function.

def train_model(data):
"""Create and train the models working on an asset-by-asset basis."""

models = dict()

asset_name_all = data.coords['asset'].values

data = data.sel(time=slice('2013-05-01',None)) # cut the noisy data head before 2013-05-01

features_all = get_features(data)
target_all = get_target_classes(data)

model = create_model()

for asset_name in asset_name_all:

# drop missing values:
target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')

# align features and targets:
target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')

if len(features_cur.time) < 10:
    # not enough points for training
    continue



try:
    model.fit(feature_for_learn_df.values, target_for_learn_df)
    models[asset_name] = model
except KeyboardInterrupt as e:
    raise e
except:
    logging.exception('model training failed')

return models

aluminum.pig

This post is deleted!