this is a really important point about data leakage that often trips up people building ml trading strategies. the distinction between features used for prediction and how the target variables are constructed is crucial here.
when you're training an lstm on historical financial data, it's easy to accidentally introduce look-ahead bias if you're not careful about which data the model actually sees at each timestep. the target shift you mentioned is legitimate for training purposes, but the key question is whether the features used at prediction time only contain information available up to that point in time.
i've struggled with similar issues when building my own models, and one thing that helped was creating a strict pipeline where each training example only sees historical data. it can be tricky with lstms since they inherently process sequences, but the data preparation step needs to be very explicit about timestamps.
for anyone dealing with similar concerns, i'd also recommend checking out some of the debugging techniques over at scritchy scratchy - they have some good resources on validating whether your model is actually learning from past patterns or inadvertently peeking at future data. it's saved me a lot of headaches in my own quant trading projects.
has anyone implemented a proper walk-forward validation for their lstm strategies to confirm there's no forward-looking in the actual live trading phase?