Q17 Machine learning - RidgeRegression (Long/Short); there is an error in the code
-
The loop "for asset_name in asset_name_all:" creates a model for each asset, but the individual models are never saved. At the end, the model for the last asset is returned and all the predictions are created based on this last model (asset 'XRP').
def train_model(data):
"""Create and train the models working on an asset-by-asset basis."""asset_name_all = data.coords['asset'].values data = data.sel(time=slice('2013-05-01',None)) # cut the noisy data head before 2013-05-01 features_all = get_features(data) target_all = get_target_classes(data) model = create_model() for asset_name in asset_name_all: # drop missing values: target_cur = target_all.sel(asset=asset_name).dropna('time', 'any') features_cur = features_all.sel(asset=asset_name).dropna('time', 'any') # align features and targets: target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner') if len(features_cur.time) < 10: # not enough points for training continue try: model.fit(feature_for_learn_df.values, target_for_learn_df) except KeyboardInterrupt as e: raise e except: logging.exception('model training failed') return model
-
@eddiee Thanks a lot. If you have some good fix, could you upload it to our repo in github? Or send a snippet here, we will happily update it. Sorry for the issue.
-
This is a possible fix, but no gurantee. You have to adjust also the prediction function.
def train_model(data):
"""Create and train the models working on an asset-by-asset basis."""models = dict()
asset_name_all = data.coords['asset'].values
data = data.sel(time=slice('2013-05-01',None)) # cut the noisy data head before 2013-05-01
features_all = get_features(data)
target_all = get_target_classes(data)model = create_model()
for asset_name in asset_name_all:
# drop missing values: target_cur = target_all.sel(asset=asset_name).dropna('time', 'any') features_cur = features_all.sel(asset=asset_name).dropna('time', 'any') # align features and targets: target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner') if len(features_cur.time) < 10: # not enough points for training continue try: model.fit(feature_for_learn_df.values, target_for_learn_df) models[asset_name] = model except KeyboardInterrupt as e: raise e except: logging.exception('model training failed')
return models
-
This post is deleted!