Quantiacs Community

Vyacheslav_B

Compare the two versions of the code.

def regime_trade(series):
    return ""


def strategy(data):
    result = regime_trade(data)
    return result


backtest(strategy=strategy)

in the first case, the available data for a specific date are used.

def regime_trade(series):
    return ""


result = regime_trade(data)

def strategy(data):
    return result


backtest(strategy=strategy)

in the second case, all data is used.

I see that you have a very large Sharpe ratio, I think that you are using something similar to the second option.

Vyacheslav_B

@spancham Hello. Try this

import xarray as xr

import qnt.backtester as qnbt
import qnt.data as qndata
import numpy as np
import pandas as pd
import logging


def load_data(period):
    return qndata.cryptofutures.load_data(tail=period)


def predict_weights(market_data):

    def get_ml_model():
        # you can use any machine learning model
        from sklearn.linear_model import RidgeClassifier
        model = RidgeClassifier(random_state=18)
        return model

    def get_features_dict(data):
        def get_features_for(asset_name):
            data_for_instrument = data.copy(True).sel(asset=[asset_name])

            # Feature 1
            price = data_for_instrument.sel(field="close").ffill('time').bfill('time').fillna(0)  # fill NaN
            price_df = price.to_dataframe()

            # Feature 2
            vol = data_for_instrument.sel(field="vol").ffill('time').bfill('time').fillna(0)  # fill NaN
            vol_df = vol.to_dataframe()

            # Merge dataframes
            for_result = pd.merge(price_df, vol_df, on='time')
            for_result = for_result.drop(['field_x', 'field_y'], axis=1)

            return for_result

        features_all_assets = {}

        asset_all = data.asset.to_pandas().to_list()
        for asset in asset_all:
            features_all_assets[asset] = get_features_for(asset)

        return features_all_assets

    def get_target_classes(data):
        # for classifiers, you need to set classes
        # if 1 then the price will rise tomorrow

        price_current = data.sel(field="close").dropna('time')  # rm NaN
        price_future = price_current.shift(time=-1).dropna('time')

        class_positive = 1
        class_negative = 0

        target_is_price_up = xr.where(price_future > price_current, class_positive, class_negative)
        return target_is_price_up.to_pandas()

    data = market_data.copy(True)

    asset_name_all = data.coords['asset'].values
    features_all_df = get_features_dict(data)
    target_all_df = get_target_classes(data)

    predict_weights_next_day_df = data.sel(field="close").isel(time=-1).to_pandas()

    for asset_name in asset_name_all:
        target_for_learn_df = target_all_df[asset_name]
        feature_for_learn_df = features_all_df[asset_name][:-1]  # last value reserved for prediction

        # align features and targets
        target_for_learn_df, feature_for_learn_df = target_for_learn_df.align(feature_for_learn_df, axis=0,
                                                                              join='inner')

        model = get_ml_model()
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)

            feature_for_predict_df = features_all_df[asset_name][-1:]

            predict = model.predict(feature_for_predict_df.values)
            predict_weights_next_day_df[asset_name] = predict
        except:
            logging.exception("model failed")
            # if there is exception, return zero values
            return xr.zeros_like(data.isel(field=0, time=0))

    return predict_weights_next_day_df.to_xarray()


weights = qnbt.backtest(
    competition_type="cryptofutures",
    load_data=load_data,
    lookback_period=18,
    start_date='2014-01-01',
    strategy=predict_weights,
    analyze=True,
    build_plots=True
)

Here is an example with indicators (Sharpe Ratio = 0.8)

 def get_features_for(asset_name):
    data_for_instrument = data.copy(True).sel(asset=[asset_name])

    # Feature 1
    price = data_for_instrument.sel(field="close")
    price = qnt.ta.roc(price, 1)
    price = price.ffill('time').bfill('time').fillna(0)
    price_df = price.to_pandas()

    # Feature 2
    vol = data_for_instrument.sel(field="vol")
    vol = vol.ffill('time').bfill('time').fillna(0)  # fill NaN
    vol_df = vol.to_pandas()

    # Merge dataframes
    for_result = pd.merge(price_df, vol_df, on='time')

    return for_result

Vyacheslav_B

@theflyingdutchman

Hello. try to execute in a separate cell the code which saves weights. This should help.
As far as I understand, this can happen when there is no data for the first day.
The cell where the error occurs will be ignored and the next one will be executed.

qnout.check(weights, data, "stocks_nasdaq100")
qnout.write(weights)

Vyacheslav_B

@cespadilla Hello.

The reason is in "train_model" function.

def train_model(data):
    asset_name_all = data.coords['asset'].values
    features_all = get_features(data)
    target_all = get_target_classes(data)


    models = dict()

    for asset_name in asset_name_all:

        # drop missing values:
        target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')
        
        
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')
        if len(features_cur.time) < 10:
                continue
        model = get_model()
        try:
            model.fit(feature_for_learn_df.values, target_for_learn_df)
            models[asset_name] = model

                
        except:
            logging.exception('model training failed')

    return models

If there are less than 10 features for training the model, then the model is not created (if len(features_cur.time) < 10).

This condition makes sense. I would not remove it.

The second thing that can affect is the retraining interval of the model ("retrain_interval").


weights = qnbt.backtest_ml(
    train=train_model,
    predict=predict_weights,
    train_period=2 *365,  # the data length for training in calendar days
    retrain_interval=10 *365,  # how often we have to retrain models (calendar days)
    retrain_interval_after_submit=1,  # how often retrain models after submission during evaluation (calendar days)
    predict_each_day=False,  # Is it necessary to call prediction for every day during backtesting?
    # Set it to true if you suspect that get_features is looking forward.
    competition_type='crypto_daily_long_short',  # competition type
    lookback_period=365,  # how many calendar days are needed by the predict function to generate the output
    start_date='2014-01-01',  # backtest start date
    analyze = True,
    build_plots=True  # do you need the chart?
)

Vyacheslav_B

@magenta-kabuto

Hello

An example using pandas
One can work with pandas DataFrames at intermediate steps and at the end convert them to xarray data structures:

def get_price_pct_change(prices):
    prices_pandas = prices.to_pandas()
    assets = data.coords["asset"].values
    for asset in assets:
        prices_pandas[asset] = prices_pandas[asset].pct_change()
    return prices_pandas

prices = data.sel(field="close") * 1.0
prices_pct_change = get_price_pct_change(prices).unstack().to_xarray()

Vyacheslav_B

@eddiee

Hello.

This code looks to the future.
It is needed to train the model.
Pay attention to the name of the variable.

Vyacheslav_B

@illustrious-felice Hello. Sometimes an error can occur at the data preprocessing stage. It's possible to inadvertently use future data.

Quantiacs has an excellent mechanism for quickly checking such errors.

In any strategy, there is a file:
https://github.com/quantiacs/toolbox/blob/main/qnt/precheck.ipynb

Run it on 3-5 splits and compare the statistics. If there are discrepancies, then it is likely that the strategy is peeking into the future.

If you set a very large number of splits, it will be an example of how the online check of submitted strategies in the contest works.

Intermediate results can be viewed in HTML format in the folder.

Vyacheslav_B

@magenta-kabuto Hello. If you are developing your strategies in a local environment, I recommend running your strategy code in an online environment before submitting it to the competition.

There may be errors related to the absence of certain Python libraries in the online environment, the use of local files, and the application of variables or settings from the local environment.

It is important that the line

import qnt.output as qnout
qnout.write(weights)

is placed in a separate cell.

Vyacheslav_B

@magenta-kabuto append

print(state)

before saving it. What are you trying to save?

you may need to restart the kernel.

Answer ChatGPT:

The error message you're encountering, AttributeError: Can't pickle local object 'Layer._initializer_tracker.<locals>.<lambda>', indicates that the pickle module is unable to serialize a lambda function (or possibly another local object) that is part of the object state you're attempting to dump to a file. This is a common limitation of pickle, as it cannot serialize lambda functions, local functions, classes defined within functions, or instances of such classes, among other things.

Avoid Using Lambda Functions in Serializable Objects
If possible, replace lambda functions with defined functions (even if they're one-liners). Defined functions can be pickled because they are not considered local objects. For example, if you have:

lambda x: x + 1

Replace it with:

def increment(x):
    return x + 1

Vyacheslav_B

@illustrious-felice

Incorporating seed initialization into your PyTorch code ensures reproducibility by making the random number generation predictable. This involves setting seeds for the PyTorch engine, NumPy, and the Python random module if you're using it. Below, I'll show you how to integrate seed initialization into your existing code. Remember, while this can make your experiments more reproducible, it does not guarantee identical results across different hardware or PyTorch versions due to the inherent nondeterminism in some GPU operations.

import xarray as xr  # xarray for data manipulation
import qnt.data as qndata  # functions for loading data
import qnt.backtester as qnbt  # built-in backtester
import qnt.ta as qnta  # technical analysis library
import numpy as np
import pandas as pd
import torch
from torch import nn, optim
import random

# Seed initialization function
def set_seed(seed_value=42):
    """Set seed for reproducibility."""
    random.seed(seed_value)
    np.random.seed(seed_value)
    torch.manual_seed(seed_value)
    torch.cuda.manual_seed(seed_value)
    torch.cuda.manual_seed_all(seed_value)  # if you are using multi-GPU.
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

# Set the seed for reproducibility
set_seed(42)

asset_name_all = ['NAS:AAPL', 'NAS:AMZN', 'NAS:MSFT']

class LSTM(nn.Module):
    """
    Class to define our LSTM network.
    """
    def __init__(self, input_dim=3, hidden_layers=64):
        super(LSTM, self).__init__()
        self.hidden_layers = hidden_layers
        self.lstm1 = nn.LSTMCell(input_dim, self.hidden_layers)
        self.lstm2 = nn.LSTMCell(self.hidden_layers, self.hidden_layers)
        self.linear = nn.Linear(self.hidden_layers, 1)

    def forward(self, y, future_preds=0):
        outputs = []
        n_samples = y.size(0)
        h_t = torch.zeros(n_samples, self.hidden_layers, dtype=torch.float32)
        c_t = torch.zeros(n_samples, self.hidden_layers, dtype=torch.float32)
        h_t2 = torch.zeros(n_samples, self.hidden_layers, dtype=torch.float32)
        c_t2 = torch.zeros(n_samples, self.hidden_layers, dtype=torch.float32)

        for time_step in range(y.size(1)):
            x_t = y[:, time_step, :]  # Ensure x_t is [batch, input_dim]

            h_t, c_t = self.lstm1(x_t, (h_t, c_t))
            h_t2, c_t2 = self.lstm2(h_t, (h_t2, c_t2))
            output = self.linear(h_t2)
            outputs.append(output.unsqueeze(1))

        outputs = torch.cat(outputs, dim=1).squeeze(-1)
        return outputs

def get_model():
    model = LSTM(input_dim=3)
    return model

def get_features(data):
    close_price = data.sel(field="close").ffill('time').bfill('time').fillna(1)
    open_price = data.sel(field="open").ffill('time').bfill('time').fillna(1)
    high_price = data.sel(field="high").ffill('time').bfill('time').fillna(1)
    log_close = np.log(close_price)
    log_open = np.log(open_price)
    features = xr.concat([log_close, log_open, high_price], "feature")
    return features

def get_target_classes(data):
    price_current = data.sel(field='close')
    price_future = qnta.shift(price_current, -1)

    class_positive = 1  # prices goes up
    class_negative = 0  # price goes down

    target_price_up = xr.where(price_future > price_current, class_positive, class_negative)
    return target_price_up

def load_data(period):
    return qndata.stocks.load_ndx_data(tail=period, assets=asset_name_all)

def train_model(data):
    features_all = get_features(data)
    target_all = get_target_classes(data)
    models = dict()

    for asset_name in asset_name_all:
        model = get_model()
        target_cur = target_all.sel(asset=asset_name).dropna('time', 'any')
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')
        target_for_learn_df, feature_for_learn_df = xr.align(target_cur, features_cur, join='inner')
        criterion = nn.MSELoss()
        optimiser = optim.LBFGS(model.parameters(), lr=0.08)
        epochs = 1
        for i in range(epochs):
            def closure():
                optimiser.zero_grad()
                feature_data = feature_for_learn_df.transpose('time', 'feature').values
                in_ = torch.tensor(feature_data, dtype=torch.float32).unsqueeze(0)
                out = model(in_)
                target = torch.zeros(1, len(target_for_learn_df.values))
                target[0, :] = torch.tensor(np.array(target_for_learn_df.values))
                loss = criterion(out, target)
                loss.backward()
                return loss
            optimiser.step(closure)
        models[asset_name] = model
    return models

def predict(models, data):
    weights = xr.zeros_like(data.sel(field='close'))
    for asset_name in asset_name_all:
        features_all = get_features(data)
        features_cur = features_all.sel(asset=asset_name).dropna('time', 'any')
        if len(features_cur.time) < 1:
            continue
        feature_data = features_cur.transpose('time', 'feature').values
        in_ = torch.tensor(feature_data, dtype=torch.float32).unsqueeze(0)
        out = models[asset_name](in_)
        prediction = out.detach()[0]
        weights.loc[dict(asset=asset_name, time=features_cur.time.values)] = prediction
    return weights

weights = qnbt.backtest_ml(
    load_data=load_data,
    train=train_model,
    predict=predict,
    train_period=55,
    retrain_interval=55,
    retrain_interval_after_submit=1,
    predict_each_day=False,
    competition_type='stocks_nasdaq100',
    lookback_period=55,
    start_date='2024-01-01',
    build_plots=True
)

I think I won't be available next week. If you have any more questions, don’t expect an answer from me next week.

Vyacheslav_B

@dark-pidgeot Hi! After the release of version qnt “0.0.402” the issue with data loading in the local environment has been resolved. The library now uses newer dependencies, including pandas version 2.2.2.

Vyacheslav_B

@buyers_are_back Hello.
Here is a new example of stock prediction using index data.
I recommend using the single-pass version.
https://quantiacs.com/documentation/en/data/indexes.html

Vyacheslav_B

@blackpearl Hello. I don’t use machine learning in trading, and I don’t have similar examples. If you know Python and know how to develop such systems, or if you use ChatGPT (or similar tools) for development, you should not have difficulties modifying existing examples. You will need to change the model training and prediction functions.

One of the competitive advantages of the Quantiacs platform is the ability to test machine learning models from a financial performance perspective.

I haven’t encountered similar tools. Typically, models are evaluated using metrics like F1 score and cross-validation (for example, in the classification task of predicting whether the price will rise tomorrow).

However, there are several problems:

It is unclear how much profit this model can generate. In real trading, there will be commissions, slippage, data errors, and the F1 score doesn’t account for these factors.
It is possible to inadvertently look into the future. For instance, data preprocessing techniques like standardization can leak future information into the past. If you subtract the mean or maximum value from each point in the time series, the maximum value reached in 2021 would be known in 2015, which is unacceptable.

The Quantiacs platform provides a tool for evaluating models from a financial performance perspective.

However, practice shows that finding a good machine learning model requires significant computational resources and time for training and testing. My results when testing strategies on real data have not been very good.

Vyacheslav_B

@machesterdragon Hello. I have already answered this question for you. see a few posts above.

Single-pass Version for Participation in the Contest
This code helps submissions get processed faster in the contest. The backtest system calculates the weights for each day, while the provided function calculates weights for only one day.

Vyacheslav_B

@illustrious-felice Hi,

https://github.com/quantiacs/strategy-ml_lstm_state/blob/master/strategy.ipynb

This repository provides an example of using state, calculating complex indicators, dynamically selecting stocks for trading, and implementing basic risk management measures, such as normalizing and reducing large positions. It also includes recommendations for submitting strategies to the competition.

Vyacheslav_B

@buyers_are_back Hello. Look at the bottom of the table. Only 5 rows are displayed there. At the bottom right you can click a button to scroll to the first row

Vyacheslav_B

@illustrious-felice Hello.

Show me an example of the code.

I don't quite understand what you are trying to do.

Maybe you just don't have enough data in the functions to get the value.

Please note that in the lines I intentionally reduce the data size to 1 day to predict only the last day.

last_time = data.time.values[-1]
data_last = data.sel(time=slice(last_time, None))

Calculate your indicators before this code, and then slice the values.

Vyacheslav_B

@multi_byte-wildebeest Hi. Without an example, it's unclear what the problem might be.

If you use a state and a function that returns the prediction for one day, you will not get correct results with precheck.

This was discussed here: https://quantiacs.com/community/topic/555/access-previous-weights/18

Vyacheslav_B

@buyers_are_back Hello.

Here is an example: example link.

You can view the list of available indexes here.

If you want to use the load_data function, take a look at this example. You can implement the index download by analogy:

example link.

Vyacheslav_B

@machesterdragon
That's how it should be. This code is needed so that submissions are processed faster when sent to the contest. The backtest system will calculate the weights for each day. The function I provided calculates weights for only one day.

Vyacheslav_B

@Vyacheslav_B

Best posts made by Vyacheslav_B

Latest posts made by Vyacheslav_B