!pip install --force-reinstall python_utils
should fix the issue.
But I have no idea what would have caused it, the line in converters.py is totally messed up. The only thing that comes to my mind is a cat on the keyboard
interpolate_na() only eliminates NaNs between 2 valid data points. Take a look at this example:
import qnt.data as qndata import numpy as np stocks = qndata.stocks_load_ndx_data() sample = stocks[:, -5:, -6:] # The latest 5 dates for the last 6 assets print(sample.sel(field='close').to_pandas()) """ asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA time 2023-05-12 13.24 97.85 35.21 45.09 NaN 8.03 2023-05-15 13.71 97.26 34.23 45.36 NaN 8.07 2023-05-16 13.48 98.25 32.84 45.25 NaN 8.13 2023-05-17 14.35 99.77 32.86 44.95 NaN 8.13 2023-05-18 14.53 102.34 33.43 44.92 NaN 8.26 """ # Let's add some more NaN values: sample.values[3, (1,3), 0] = np.nan sample.values[3, 1:4, 1] = np.nan sample.values[3, :2, 2] = np.nan sample.values[3, 2:, 3] = np.nan sample.values[3, :-1, 5] = np.nan print(sample.sel(field='close').to_pandas()) """ asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA time 2023-05-12 13.24 97.85 NaN 45.09 NaN NaN 2023-05-15 NaN NaN NaN 45.36 NaN NaN 2023-05-16 13.48 NaN 32.84 NaN NaN NaN 2023-05-17 NaN NaN 32.86 NaN NaN NaN 2023-05-18 14.53 102.34 33.43 NaN NaN 8.26 """ # Interpolate the NaN values: print(sample.interpolate_na('time').sel(field='close').to_pandas()) """ asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA time 2023-05-12 13.240 97.850000 NaN 45.09 NaN NaN 2023-05-15 13.420 100.095000 NaN 45.36 NaN NaN 2023-05-16 13.480 100.843333 32.84 NaN NaN NaN 2023-05-17 14.005 101.591667 32.86 NaN NaN NaN 2023-05-18 14.530 102.340000 33.43 NaN NaN 8.26 """
As you can see, only the NaNs in the first 2 columns are being replaced. The others remain untouched and might be dropped when you use dropna().
Another thing you should keep in mind is that you might introduce lookahead bias with interpoloation, e. g. in a single run backtest. In my example for instance (pretend the NaNs I added were already in the data) you would know on 2023-05-15 that ORCL will rise when in reality you would first know that on 2023-05-18.
Asuming whatever train is has a similar structure as the usual stock data, I get the same error as you with:
import itertools import qnt.data as qndata stocks = qndata.stocks_load_ndx_data(tail=100) for comb in itertools.combinations(stocks.asset, 2): print(stocks.sel(asset=[comb]))
There are 2 things to consider:
My example works when the loop looks like this:
for comb in itertools.combinations(stocks.asset.values, 2): print(stocks.sel(asset=list(comb)))
It's safe to ignore these notices but if they bother you, you can set the variables together with your API key using the defaults and the messages go away:
import os os.environ['API_KEY'] = 'YOUR-API-KEY' os.environ['DATA_BASE_URL'] = 'https://data-api.quantiacs.io/' os.environ['CACHE_RETENTION'] = '7' os.environ['CACHE_DIR'] = 'data-cache'
Could you please add CIKs to the NASDAQ100 stock list?
In order to load fundamental data from secgov we need the CIKs for the stocks but they're currently not in the list we get from qnt.data.stocks_load_ndx_list().
Allthough it is still possible to get fundamentals using qnt.data.stocks_load_list(), it takes a little bit acrobatics like this for instance:
import pandas as pd import qnt.data as qndata stocks = qndata.stocks_load_ndx_data() df_ndx = pd.DataFrame(qndata.stocks_load_ndx_list()).set_index('symbol') df_all = pd.DataFrame(qndata.stocks_load_list()).set_index('symbol') idx = sorted(set(df_ndx.index) & set(df_all.index)) df = df_ndx.loc[idx] df['cik'] = df_all.cik[idx] symbols = list(df.reset_index().T.to_dict().values()) fundamentals = qndata.secgov_load_indicators(symbols, stocks.time)
It would be nice if we could get them with just 2 lines like so:
stocks = qndata.stocks_load_ndx_data() fundamentals = qndata.secgov_load_indicators(qndata.stocks_load_ndx_list(), stocks.time)
Also, the workaround doesn't work locally because qndata.stocks_load_list() seems to return the same list as qndata.stocks_load_ndx_list().
Thanks in advance!
@eddiee Try step 4 without quotes, this should start jupyter notebook. And if that's your real API-key we see in the image, delete your last post. It's a bad idea to post it in a public forum
Yes, I noticed that too. And after fixing it the backtest takes forever...
Another thing to consider is that it redefines the model with each training but I belive you can retrain already trainded NNs with new Data so they learn based on what they previously learned.
About the slicing error, I had that too a while ago. It took me some time to figure out that it wasn't enough to have the right pandas version in the environment. Because I had another python install with the same version in my PATH, the qntdev-python also looked there and always used the newer pandas. So I placed the -s flag everywhere the qntdev python is supposed to run (PyCharm, Jupyter, terminal) like this
/path/to/quantiacs/python -s strategy.py
Of course one could simply remove the other python install from PATH but I needed it there.
In case you don't want to run init.py every time in order to install external libraries, I came up with a solution for this. You basically install the library in a folder in your home directory and let the strategy create symlinks to the module path at runtime. More details in this post.
Since the live period for the Q16 contest is coming to an end I'm watching my participating algorithms more closely and noticed something odd:
The closing prices are the same on 2022-02-24 and 2022-02-25 to the last decimal for allmost all cryptos (49 out of 54).
import qnt.data as qndata crypto = qndata.cryptodaily.load_data(tail=10) c = crypto.sel(field='close').to_pandas().iloc[-3:] liquid = crypto.sel(field='is_liquid').fillna(0).values.astype(bool)[-3:] # only showing the cryptos which were liquid for the last 3 days: c.iloc[:, liquid.all(axis=0)] asset ADA AVAX BNB BTC DOGE DOT ETH LINK SOL XRP time 2022-02-23 0.8664 73.47 365.6 37264.053 0.1274 15.97 2580.9977 13.34 84.64 0.696515 2022-02-24 0.8533 76.39 361.2 38348.744 0.1242 16.16 2598.0195 13.27 89.41 0.696359 2022-02-25 0.8533 76.39 361.2 38348.744 0.1242 16.16 2598.0195 13.27 89.41 0.696359
(c.values[-1] == c.values[-2]).sum(), c.shape (49, 54)
Could you please have a look?
Yes, pip is way faster. Thanks!
I might have found an even faster solution but I guess I have to wait a few hours to find out if it really works.
Here's what I did:
!mkdir modules && pip install --target=modules cvxpy
try: import cvxpy as cp except ImportError: import os source = '/root/book/modules/' target = '/usr/local/lib/python3.7/site-packages/' for dirpath, dirnames, filenames in os.walk(source): source_path = dirpath.replace(source, '') target_path = os.path.join(target, source_path) if not os.path.exists(target_path) and not os.path.islink(target_path): os.symlink(dirpath, target_path) continue for file in filenames: source_file = os.path.join(dirpath, file) target_file = os.path.join(target, source_path, file) if not os.path.exists(target_file) and not os.path.islink(target_file): os.symlink(source_file, target_file) import cvxpy as cp
Creating the symlinks only takes 0.07 seconds, so fingers crossed
UPDATE (a few hours later):
It actually worked. When I just reopened the strategy, the environment was newly initialized. First I tried just importing cvxpy and got the ModuleNotFoundError. Then I ran the strategy including the code above: cvxpy was imported correctly and the strategy ran.
I'm not sure if that solution works for every module because I don't know if pip might also write something to other directories than site-packages.
Anyway, I'm happy with this solution.
It's actually the same strategy / environment, not a new one.
If I haven't used it for a while (say, a few hours or a day) and open it again by clicking on the Jupyter button, it says:
Initialization of the virtual environment. The notebook will be ready in 15 seconds.
And when I try to run the strategy that worked fine a few hours or a day ago, I get the ModuleNotFoundError and have to install the module again.
Everything else is still there as it was before - the strategy, custom files - just not cvxpy.
Hello @support ,
I've been using cvxpy in the server environment which I installed by running
!conda install -y -c conda-forge cvxpy
in init.ipynb. But whenever this environment is newly initialized, the module is gone and I have to run this cell again (which takes awfully long).
Is this normal or is there something wrong with my environment?
My current workaround is placing these lines before the import
try: import cvxpy as cp except ImportError: import subprocess cmd = 'conda install -y -c conda-forge cvxpy'.split() rn = subprocess.run(cmd) import cvxpy as cp
Is there a better way?
I'm having 2 issues with legacy.quantiacs.com:
Could you please take a look?
To get the actual statistics you currently have to calculate them like so:
import qnt.stats as qns data = qndata.cryptodaily_load_data(min_date="2014-01-01") # or whenever your backtest started stats = qns.calc_stat(data, weights)
And if you really need them as xls file you can do:
stats.to_pandas().to_excel('stats.xls') # I got a ModuleNotFoundError the first time - pip install did the trick.
Allthough I can't recommend xls because at least LibreOffice becomes very slow / unresponsive when handling such a file.
Getting the statistics after a backtest could be a little simpler, which brings me to a feature request:
Do you think you could add a parameter to the backtester which makes it return the statistics? They get calculated anyway by default, but we only see a truncated printout or the plots and can't use them for further analysis.
In my local environment I did it like this in qnt.backtester.py:
qnout.write(result) qnstate.write(state) if return_stats: analyze = True out = [result] if analyze: log_info("---") stats = analyze_results(result, data, competition_type, build_plots, start_date) if return_stats: out.append(stats) if args_count > 1: out.append(state) if len(out) == 1: out = out return out finally: qndc.set_max_datetime(None)
if not build_plots: log_info(stat_global.to_pandas().tail()) return stat_global # here log_info("---") log_info("Calc stats per asset...") stat_per_asset = qnstat.calc_stat(data, output, per_asset=True) stat_per_asset = stat_per_asset.loc[output.time.values:] if is_notebook(): build_plots_jupyter(output, stat_global, stat_per_asset) else: build_plots_dash(output, stat_global, stat_per_asset) return stat_global # and there
This might not be the most elegant solution but you get the idea.
Now I can get the statistics immediately after the backtest with
weights, stats = backtest(...return_stats=True)
and can do further analysis.
For instance, I started to calculate the correlations between my strategies to avoid uploading more of the same to the contest.
It would be nice to have this feature in a future version, so I don't have to mess with the backtester after each update
whenever I run a backtest on the server I get the message
WARNING! Can't calculate correlation.
This has been happening since I started developing for the Q16 contest.
I don't know if this has any influence on the actual submission check and we'll soon have x times the quickstart template in the contest
Anyway, it would be good if the correlation check would work before we submitt algos that will eventually fail the correlation filter.
about that link, the important part is the one that starts with the question mark, with utm_medium being our unique identifier, right?
So, can we change the link to point to the contest description instead of the login-page, like this?
Then interested people could first read more details about the contest before signing up...
@support In my posts I was merely thinking about unintentional lookahead bias because when it comes to the intentional kind, there are lots of ways to do that and I believe you never can make all of them impossible.
But I think that's what the rules are for and the live test is also a good measure to call out intentional or unintentional lookahead bias as well as simple innocent overfitting.
To clarify the Quantopian example a bit, I don't think what I described was meant to prevent lookahead bias. The 8000 something symbols just was all what they had and the rules for the tradable universe were publicly available (QTradableStocksUS on archive.org). I just thought, providing data for a larger set than what's actually tradable would make the scenarios I mentioned less likely. For that purpose I think both sets could also be openly defined. Let's say the larger one has the top 100 symbols in terms of market cap, dollar volume or whatever and the tradable ones could be the top 10 out of them with the same measurement.
On the other hand, I still don't know if those scenarios could become a real problem. Because what good does this foreknowledge if you can't trade them yet? And after they're in the top 10 it would be legitimate to use the fact that they just entered, because we would also have known this at that time in real life.