Greetings --
There is a fundamental difference in the approach the developer takes when using traditional trading system development versus using machine learning development.
When using a traditional trading system development platform to build a formula-based trading system, the indicators and rules are written and interpreted first, these generate buy and sell signals, resulting in trades, which are analyzed. In short -- examine an indicator, see what happens later.
When using machine learning to build trading systems, a set of data that includes both indicator values (predictor variables) and buy and sell signals (target variables) is created, with the buy and sell signals formed so that the resulting trades are the ones the developer would like to take. All of this data is passed to the machine learning / pattern identification toolkit where a set of rules or equations are formed that relate the indicator data to the signals. In short -- desirable trades first, then see what happened earlier.
The machine learning toolkit, such as scikit-learn, has a large number of model templates that can be used to fit the indicator data to the signals. One of those models is decision tree. Others include support vector machine, linear regression, neural network, and many more. Traditional trading system development platforms (almost always) use the decision tree template for the model. Decision trees are tree-like sequences of if-then-else statements guiding the computations to "leafs" that have values of Buy or Sell. They have the advantage of being easy to understand. As evidenced by the profitability of traditional systems, decision trees can produce excellent trading systems. One of their weaknesses is a tendency to overfit during training / optimization. This results in them being overly fit to the in-sample data with poor performance out-of-sample.
The best validation for traditional platforms is walk forward. The out-of-sample test data is more recent values of the same data streams used to develop the system. The indicators and rules used are those ranked as "best" by objective function value of all alternatives tested over the in-sample period.
Monte Carlo techniques are typically not used in model development when using a traditional platform. They may be used in guiding a non-exhaustive search for the best solution, but that is only to shorten the search and avoid an exhaustive search.
Walk forward validation is also used for systems developed using machine learning, and in precisely the same way. There are three data sets used for machine learning development. In additional to the out-of-sample validation data, during the learning process, and before the validation step, there is a division of the in-sample data into two sets -- training and testing. The training set is extensively examined to determine the formulas or patterns. The testing set is used to guide the training. It is common to use random selection of data elements to divide the whole set of learning data into training and testing. That can be thought of as a Monte Carlo operation. There can also be guidance to the search using non-exhaustive methods, similar to those used for the traditional platform.
--------------
The most important use of Monte Carlo analysis -- whether the system has been developed using traditional platform or machine learning -- is estimation of risk and profit potential. This occurs after the system has been validated. It is a separate step that uses the out-of-sample trades as its input data, and produces risk and profit metrics as its output.
-------------
It is true that data points passed to the machine learning process must be self contained and independent of each other. That means that historical data that the developer thinks might be valuable as a pattern variable, such as yesterday's RSI value, must be explicitly computed, stored, and included as a variable. (Monte Carlo is not used in this operation.)
A traditional system might appear to have just a few variables -- RSI lookback length, buy level, sell level. While the equivalent data file for the machine learning process probably has more -- RSI today, RSI yesterday, etc. They are called predictor variables. The traditional development implicitly uses previous data through builtin functions that refer to yesterday's value, such as Cross. It can do this because the time series data continues to be available during model development.
It is tempting to compute and include a large number of predictor variables, hoping that the machine learning routine can sort out which are important. That is usually an unreasonable expectation. A smaller number of predictor variables (perhaps a maximum of ten or so) should be determined either by the developer using his or her experience (his or her "domain knowledge" -- in the jargon), or by a using a subset chosen by a search.
---------------
Best regards,
Howard