Simulation results for the article 'Variable selection in linear regression models: choosing the best subset is not always the best choice'
Hanke, M., Dijkstra, L., Foraita, R. and Didelez, V. (2022)
Synthetic data
Semi-synthetic data
Dimensionality
Low (p=100, n=1000)
Medium (p=500, n=500)
High (p=1000, n=100)
Correlartion structure
Block
Toeplitz
Independent
Strength of correlation (ρ)
0.35
0.7
Position of non-zeros
consecutive
equally spaced
Metric for performance evaluation
Best possible F1 score
Best possible F2 score
Best possible MCC value
Signal-to-noise ratio τ (for tab "Performance based on subset size" only)
0.05
0.09
0.14
0.25
0.42
0.71
1.22
2.07
3.52
6.00
Selected methods for different subset sizes (for tab "Performance based on subset size" only)
Enet 0.1
Enet 0.2
Enet 0.3
Enet 0.4
Enet 0.5
Enet 0.6
Enet 0.7
Enet 0.8
Enet 0.9
Lasso
FSS
BSS
Variable selection performance
Performance based on subset size
Dimensionality
Low (p=100, n=1000)
High (p=1000, n=100)
Signal-to-noise ratio τ (for tab "Performance based on subset size" only)
0.42
1.22
3.52
Selected methods for different subset sizes (for tab "Performance based on subset size" only)
Enet 0.1
Enet 0.2
Enet 0.3
Enet 0.4
Enet 0.5
Enet 0.6
Enet 0.7
Enet 0.8
Enet 0.9
Lasso
FSS
BSS
Variable selection performance
Performance based on subset size