Machine Learning Strategy

spancham

Hi @support
I have some questions on the Machine Learning strategy:

1. When you set up the feature, you treated missing data in one way:

price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN

But when you set up the target, you treated it differently,

price_current = data.sel(field="close").dropna('time') # rm NaN

Doesn't that cause misalignment and introduce possible errors, even though I see you tried to align the datasets after?

2. Why do you use only a buy up target?

class_positive = 1
class_negative = 0
target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)

That means the ML strategy will never sell? If you have a priori knowledge that crytos have been going up all this time, then it seems you will inevitably have a good performing strategy.
I admit, I did try a basic buy only strategy with no ML and it did not perform as well as the ML strategy to test that theory.
But what will happen when crytos start going down? Maybe that will never happen as more people in the world keep piling in.

3. What does this correlation failure mean?

INFO: 2021-03-30T19:06:59Z: pass started: 655331
INFO: 2021-03-30T19:07:15Z: pass completed: 655331
INFO: 2021-03-30T19:07:17Z: stats received light=false
INFO: 2021-03-30T19:07:17Z: progress: 1.0
INFO: 2021-03-30T19:07:17Z: checking: last pass
INFO: 2021-03-30T19:07:17Z: filter passed: source exists
INFO: 2021-03-30T19:07:17Z: filter passed: output html exists
INFO: 2021-03-30T19:07:17Z: filter passed: output exists
INFO: 2021-03-30T19:07:17Z: filter passed: strategy uses the last data
INFO: 2021-03-30T19:07:17Z: filter passed: in-sample size enough 
INFO: 2021-03-30T19:07:17Z: Sharpe ratio = 1.94598167418714
INFO: 2021-03-30T19:07:17Z: filter passed: sharpe ratio > 1
FAIL: 2021-03-30T19:07:17Z: filter failed: the strategy correlates with other strategies: [{"id":"222363","cofactor":0.9446207002399428,"sharpeRatio":1.7638719827031102},{"id":"222367","cofactor":0.9615233769176826,"sharpeRatio":2.011604789293279}]

I used a a different ML classifier strategy which has completely different hyperparameters from a Ridge Classifier. By how much do I have to change the template?
Is it correlating with the Quantiacs example or other submitted strategies?

Thanks.

support

Hello.

This strategy correlates with the examples.

The cofactor(correlation factor) must be lower than 0.9 or the Shape Ratio of your strategy must be higher (for the last 3 years).

Try to use the other features: volume, ROC(rate of change), or other technical indicators.

Regards.

spancham

@support
Thank you. I'll try that.

spancham

@support
ok, the system let me submit a new strategy:

I hope this one works.
I can keep working on getting a higher Sharpe Ratio, and update the strategy, right?
Thanks.

support

@spancham

Yes, you can continue. The system saves a copy when you submit the strategy.

spancham

@support
Yaay! I got one accepted
I know the SR is at the bottom of the barrel on the Leaderboard, but I'm still grateful I got one accepted.

Ok, I'm inspired that this is doable for me.
Btw, thanks to everyone on your team for responding to my support requests & helping me understand the Quantiacs platform in a few short weeks.

spancham

@support

Can you help pls with an example on how to include more than one feature, such as from the fields (OHLCV)?
And also from the qnt.ta library?
I am running into a problem converting the feature set to pandas when there are more than one features.

price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN
        for_result = price.to_pandas()

Thank you.

support

@spancham Hello, could you elaborate more on your request? In principle, you could just repeat the procedure you use for the "close" and you will work with more dataframes.

spancham

Hi @support
Ok, let me think about what you are suggesting & see if I can get that to work.
Will let you know.
Thanks.

spancham

@support
ok guys, I tried what you suggested and I am running into all sorts of problems.
I want to pass several features altogether in one dataframe.
Are you guys thinking that I want to 'test' one feature at a time and that is why you are suggesting working with more than one dataframe?
Here is an example of some code I tried, but I would still have to merge the dataframes in order to pass the feature set to the classifier:

def get_features(data):
        # let's come up with features for machine learning
        # take the logarithm of closing prices
        def remove_trend(prices_pandas_):
            prices_pandas = prices_pandas_.copy(True)
            assets = prices_pandas.columns
            print(assets)
            for asset in assets:
                print(prices_pandas[asset])
                prices_pandas[asset] = np.log(prices_pandas[asset])
            return prices_pandas
        
        # Feature 1
        price = data.sel(field="close").ffill('time').bfill('time').fillna(0) # fill NaN
        price_df = price.to_dataframe()
        
        # Feature 2
        vol = data.sel(field="vol").ffill('time').bfill('time').fillna(0) # fill NaN
        vol_df = vol.to_dataframe()
        
        # Merge dataframes
        for_result = pd.merge(price_df, vol_df, on='time')
        for_result = for_result.drop(['field_x', 'field_y'], axis=1)
            
        features_no_trend_df = remove_trend(for_result)
        return features_no_trend_df

Can you help with some code as to what you are suggesting?
Thanks

Vyacheslav_B

@spancham Hello. Try this

import xarray as xr

import qnt.backtester as qnbt
import qnt.data as qndata
import numpy as np
import pandas as pd
import logging


def load_data(period):
    return qndata.cryptofutures.load_data(tail=period)


def predict_weights(market_data):

    def get_ml_model():
        # you can use any machine learning model
        from sklearn.linear_model import RidgeClassifier
        model = RidgeClassifier(random_state=18)
        return model

    def get_features_dict(data):
        def get_features_for(asset_name):
            data_for_instrument = data.copy(True).sel(asset=[asset_name])

            # Feature 1
            price = data_for_instrument.sel(field="close").ffill('time').bfill('time').fillna(0)  # fill NaN
            price_df = price.to_dataframe()

            # Feature 2
            vol = data_for_instrument.sel(field="vol").ffill('time').bfill('time').fillna(0)  # fill NaN
            vol_df = vol.to_dataframe()

            # Merge dataframes
            for_result = pd.merge(price_df, vol_df, on='time')
            for_result = for_result.drop(['field_x', 'field_y'], axis=1)

            return for_result

        features_all_assets = {}

        asset_all = data.asset.to_pandas().to_list()
        for asset in asset_all:
            features_all_assets[asset] = get_features_for(asset)

        return features_all_assets

    def get_target_classes(data):
        # for classifiers, you need to set classes
        # if 1 then the price will rise tomorrow

        price_current = data.sel(field="close").dropna('time')  # rm NaN
        price_future = price_current.shift(time=-1).dropna('time')

        class_positive = 1
        class_negative = 0

        target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
        return target_is_price_up.to_pandas()

    data = market_data.copy(True)

    asset_name_all = data.coords['asset'].values
    features_all_df = get_features_dict(data)
    target_all_df = get_target_classes(data)

    predict_weights_next_day_df = data.sel(field="close").isel(time=-1).to_pandas()

    for asset_name in asset_name_all:
        target_for_learn_df = target_all_df[asset_name]
        feature_for_learn_df = features_all_df[asset_name][:-1]  # last value reserved for prediction

        # align features and targets
        target_for_learn_df, feature_for_learn_df = target_for_learn_df.align(feature_for_learn_df, axis=0,
                                                                              join='inner')

        model = get_ml_model()
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)

            feature_for_predict_df = features_all_df[asset_name][-1:]

            predict = model.predict(feature_for_predict_df.values)
            predict_weights_next_day_df[asset_name] = predict
        except:
            logging.exception("model failed")
            # if there is exception, return zero values
            return xr.zeros_like(data.isel(field=0, time=0))

    return predict_weights_next_day_df.to_xarray()


weights = qnbt.backtest(
    competition_type="cryptofutures",
    load_data=load_data,
    lookback_period=18,
    start_date='2014-01-01',
    strategy=predict_weights,
    analyze=True,
    build_plots=True
)

Here is an example with indicators (Sharpe Ratio = 0.8)

 def get_features_for(asset_name):
    data_for_instrument = data.copy(True).sel(asset=[asset_name])

    # Feature 1
    price = data_for_instrument.sel(field="close")
    price = qnt.ta.roc(price, 1)
    price = price.ffill('time').bfill('time').fillna(0)
    price_df = price.to_pandas()

    # Feature 2
    vol = data_for_instrument.sel(field="vol")
    vol = vol.ffill('time').bfill('time').fillna(0)  # fill NaN
    vol_df = vol.to_pandas()

    # Merge dataframes
    for_result = pd.merge(price_df, vol_df, on='time')

    return for_result

spancham

@vyacheslav_b
Thank you!

Sjackson3289

This post is deleted!