@cyan-gloom
interpolate_na() only eliminates NaNs between 2 valid data points. Take a look at this example:
import qnt.data as qndata
import numpy as np
stocks = qndata.stocks_load_ndx_data()
sample = stocks[:, -5:, -6:] # The latest 5 dates for the last 6 assets
print(sample.sel(field='close').to_pandas())
"""
asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA
time
2023-05-12 13.24 97.85 35.21 45.09 NaN 8.03
2023-05-15 13.71 97.26 34.23 45.36 NaN 8.07
2023-05-16 13.48 98.25 32.84 45.25 NaN 8.13
2023-05-17 14.35 99.77 32.86 44.95 NaN 8.13
2023-05-18 14.53 102.34 33.43 44.92 NaN 8.26
"""
# Let's add some more NaN values:
sample.values[3, (1,3), 0] = np.nan
sample.values[3, 1:4, 1] = np.nan
sample.values[3, :2, 2] = np.nan
sample.values[3, 2:, 3] = np.nan
sample.values[3, :-1, 5] = np.nan
print(sample.sel(field='close').to_pandas())
"""
asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA
time
2023-05-12 13.24 97.85 NaN 45.09 NaN NaN
2023-05-15 NaN NaN NaN 45.36 NaN NaN
2023-05-16 13.48 NaN 32.84 NaN NaN NaN
2023-05-17 NaN NaN 32.86 NaN NaN NaN
2023-05-18 14.53 102.34 33.43 NaN NaN 8.26
"""
# Interpolate the NaN values:
print(sample.interpolate_na('time').sel(field='close').to_pandas())
"""
asset NYS:NCLH NYS:ORCL NYS:PRGO NYS:QGEN NYS:RHT NYS:TEVA
time
2023-05-12 13.240 97.850000 NaN 45.09 NaN NaN
2023-05-15 13.420 100.095000 NaN 45.36 NaN NaN
2023-05-16 13.480 100.843333 32.84 NaN NaN NaN
2023-05-17 14.005 101.591667 32.86 NaN NaN NaN
2023-05-18 14.530 102.340000 33.43 NaN NaN 8.26
"""
As you can see, only the NaNs in the first 2 columns are being replaced. The others remain untouched and might be dropped when you use dropna().
Another thing you should keep in mind is that you might introduce lookahead bias with interpoloation, e. g. in a single run backtest. In my example for instance (pretend the NaNs I added were already in the data) you would know on 2023-05-15 that ORCL will rise when in reality you would first know that on 2023-05-18.