Author: Li Der-Chiang Huang Wen-Ting Chen Chien-Chih Chang Che-Jung
Publisher: Taylor & Francis Ltd
ISSN: 0020-7543
Source: International Journal of Production Research, Vol.51, Iss.11, 2013-06, pp. : 3206-3224
Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.
Abstract
Machine learning algorithms are widely applied to extract useful information, but the sample size is often an important factor in determining their reliability. The key issue that makes small dataset learning tasks difficult is that the information that such datasets contain cannot fully represent the characteristics of the entire population. The principal approach of this study to overcome this problem is systematically adding artificial samples to fill the data gaps; this research employs the mega-trend-diffusion technique to generate virtual samples to extend the data size. In this paper, a real, small dataset learning task in the array process of a thin-film transistor liquid-crystal display (TFT-LCD) panel manufacturer is proposed, where there are only 20 samples used for learning the relationship between 15 inputs and 36 output attributes. The experiment results show that the approach is effective in building robust back-propagation neural network (BPN) and support vector regression (SVR) models. In addition, a sensitivity analysis is implemented with the 20 samples by using SVR to extract the relationship between the 15 factors and the 36 outputs to help engineers infer process knowledge.
Related content
Power Enhancement in High‐Dimensional Cross‐Sectional Tests
ECONOMETRICA, Vol. 83, Iss. 4, 2015-07 ,pp. :
Do High‐Frequency Data Improve High‐Dimensional Portfolio Allocations?
JOURNAL OF APPLIED ECONOMETRICS, Vol. 30, Iss. 2, 2015-03 ,pp. :