2020 conference paper
Simulation Optimization Based Feature Selection, a Study on Data-Driven Optimization with Input Uncertainty
2020 Winter Simulation Conference (WSC), 2149–2160.
In machine learning, removing uninformative or redundant features from a dataset can significantly improve the construction, analysis, and interpretation of the prediction models, especially when the set of collected features is extensive. We approach this challenge with simulation optimization over a high dimensional binary space in place of the classic greedy search in forward or backward selection or regularization methods. We use genetic algorithms to generate scenarios, bootstrapping to estimate the contribution of the intrinsic and extrinsic noise and sampling strategies to expedite the procedure. By including the uncertainty from the input data in the measurement of the estimators’ variability, the new framework obtains robustness and efficiency. Our results on a simulated dataset exhibit improvement over state-of-the-art accuracy, interpretability, and reliability. Our proposed framework provides insight for leveraging Monte Carlo methodology in probabilistic data-driven modeling and analysis.