Machine Learning Strategy
-
Hi @support
I have some questions on the Machine Learning strategy:1. When you set up the feature, you treated missing data in one way:
price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN
But when you set up the target, you treated it differently,
price_current = data.sel(field="close").dropna('time') # rm NaN
Doesn't that cause misalignment and introduce possible errors, even though I see you tried to align the datasets after?
2. Why do you use only a buy up target?
class_positive = 1 class_negative = 0 target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
That means the ML strategy will never sell? If you have a priori knowledge that crytos have been going up all this time, then it seems you will inevitably have a good performing strategy.
I admit, I did try a basic buy only strategy with no ML and it did not perform as well as the ML strategy to test that theory.
But what will happen when crytos start going down? Maybe that will never happen as more people in the world keep piling in.3. What does this correlation failure mean?
INFO: 2021-03-30T19:06:59Z: pass started: 655331 INFO: 2021-03-30T19:07:15Z: pass completed: 655331 INFO: 2021-03-30T19:07:17Z: stats received light=false INFO: 2021-03-30T19:07:17Z: progress: 1.0 INFO: 2021-03-30T19:07:17Z: checking: last pass INFO: 2021-03-30T19:07:17Z: filter passed: source exists INFO: 2021-03-30T19:07:17Z: filter passed: output html exists INFO: 2021-03-30T19:07:17Z: filter passed: output exists INFO: 2021-03-30T19:07:17Z: filter passed: strategy uses the last data INFO: 2021-03-30T19:07:17Z: filter passed: in-sample size enough INFO: 2021-03-30T19:07:17Z: Sharpe ratio = 1.94598167418714 INFO: 2021-03-30T19:07:17Z: filter passed: sharpe ratio > 1 FAIL: 2021-03-30T19:07:17Z: filter failed: the strategy correlates with other strategies: [{"id":"222363","cofactor":0.9446207002399428,"sharpeRatio":1.7638719827031102},{"id":"222367","cofactor":0.9615233769176826,"sharpeRatio":2.011604789293279}]
I used a a different ML classifier strategy which has completely different hyperparameters from a Ridge Classifier. By how much do I have to change the template?
Is it correlating with the Quantiacs example or other submitted strategies?Thanks.
-
Hello.
This strategy correlates with the examples.
The cofactor(correlation factor) must be lower than 0.9 or the Shape Ratio of your strategy must be higher (for the last 3 years).
Try to use the other features: volume, ROC(rate of change), or other technical indicators.
Regards.
-
@support
Thank you. I'll try that. -
@support
ok, the system let me submit a new strategy:I hope this one works.
I can keep working on getting a higher Sharpe Ratio, and update the strategy, right?
Thanks. -
Yes, you can continue. The system saves a copy when you submit the strategy.
-
@support
Yaay! I got one accepted
I know the SR is at the bottom of the barrel on the Leaderboard, but I'm still grateful I got one accepted.
Ok, I'm inspired that this is doable for me.
Btw, thanks to everyone on your team for responding to my support requests & helping me understand the Quantiacs platform in a few short weeks.
-
- Can you help pls with an example on how to include more than one feature, such as from the fields (OHLCV)?
- And also from the qnt.ta library?
I am running into a problem converting the feature set to pandas when there are more than one features.
price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN for_result = price.to_pandas()
Thank you.
-
@spancham Hello, could you elaborate more on your request? In principle, you could just repeat the procedure you use for the "close" and you will work with more dataframes.
-
Hi @support
Ok, let me think about what you are suggesting & see if I can get that to work.
Will let you know.
Thanks. -
@support
ok guys, I tried what you suggested and I am running into all sorts of problems.
I want to pass several features altogether in one dataframe.
Are you guys thinking that I want to 'test' one feature at a time and that is why you are suggesting working with more than one dataframe?
Here is an example of some code I tried, but I would still have to merge the dataframes in order to pass the feature set to the classifier:def get_features(data): # let's come up with features for machine learning # take the logarithm of closing prices def remove_trend(prices_pandas_): prices_pandas = prices_pandas_.copy(True) assets = prices_pandas.columns print(assets) for asset in assets: print(prices_pandas[asset]) prices_pandas[asset] = np.log(prices_pandas[asset]) return prices_pandas # Feature 1 price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN price_df = price.to_dataframe() # Feature 2 vol = data.sel(field="vol").ffill('time').bfill('time').fillna(0) # fill NaN vol_df = vol.to_dataframe() # Merge dataframes for_result = pd.merge(price_df, vol_df, on='time') for_result = for_result.drop(['field_x', 'field_y'], axis=1) features_no_trend_df = remove_trend(for_result) return features_no_trend_df
Can you help with some code as to what you are suggesting?
Thanks -
@spancham Hello. Try this
import xarray as xr import qnt.backtester as qnbt import qnt.data as qndata import numpy as np import pandas as pd import logging def load_data(period): return qndata.cryptofutures.load_data(tail=period) def predict_weights(market_data): def get_ml_model(): # you can use any machine learning model from sklearn.linear_model import RidgeClassifier model = RidgeClassifier(random_state=18) return model def get_features_dict(data): def get_features_for(asset_name): data_for_instrument = data.copy(True).sel(asset=[asset_name]) # Feature 1 price = data_for_instrument.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN price_df = price.to_dataframe() # Feature 2 vol = data_for_instrument.sel(field="vol").ffill('time').bfill('time').fillna(0) # fill NaN vol_df = vol.to_dataframe() # Merge dataframes for_result = pd.merge(price_df, vol_df, on='time') for_result = for_result.drop(['field_x', 'field_y'], axis=1) return for_result features_all_assets = {} asset_all = data.asset.to_pandas().to_list() for asset in asset_all: features_all_assets[asset] = get_features_for(asset) return features_all_assets def get_target_classes(data): # for classifiers, you need to set classes # if 1 then the price will rise tomorrow price_current = data.sel(field="close").dropna('time') # rm NaN price_future = price_current.shift(time=-1).dropna('time') class_positive = 1 class_negative = 0 target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative) return target_is_price_up.to_pandas() data = market_data.copy(True) asset_name_all = data.coords['asset'].values features_all_df = get_features_dict(data) target_all_df = get_target_classes(data) predict_weights_next_day_df = data.sel(field="close").isel(time=-1).to_pandas() for asset_name in asset_name_all: target_for_learn_df = target_all_df[asset_name] feature_for_learn_df = features_all_df[asset_name][:-1] # last value reserved for prediction # align features and targets target_for_learn_df, feature_for_learn_df = target_for_learn_df.align(feature_for_learn_df, axis=0, join='inner') model = get_ml_model() try: model.fit(feature_for_learn_df.values, target_for_learn_df) feature_for_predict_df = features_all_df[asset_name][-1:] predict = model.predict(feature_for_predict_df.values) predict_weights_next_day_df[asset_name] = predict except: logging.exception("model failed") # if there is exception, return zero values return xr.zeros_like(data.isel(field=0, time=0)) return predict_weights_next_day_df.to_xarray() weights = qnbt.backtest( competition_type="cryptofutures", load_data=load_data, lookback_period=18, start_date='2014-01-01', strategy=predict_weights, analyze=True, build_plots=True )
Here is an example with indicators (Sharpe Ratio = 0.8)
def get_features_for(asset_name): data_for_instrument = data.copy(True).sel(asset=[asset_name]) # Feature 1 price = data_for_instrument.sel(field="close") price = qnt.ta.roc(price, 1) price = price.ffill('time').bfill('time').fillna(0) price_df = price.to_pandas() # Feature 2 vol = data_for_instrument.sel(field="vol") vol = vol.ffill('time').bfill('time').fillna(0) # fill NaN vol_df = vol.to_pandas() # Merge dataframes for_result = pd.merge(price_df, vol_df, on='time') return for_result
-
@vyacheslav_b
Thank you! -
This post is deleted!