No Cover Image

E-Thesis 526 views 257 downloads

Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model / MESHARI EBRAHIM

Swansea University Author: MESHARI EBRAHIM

  • Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf

    PDF | E-Thesis – open access

    Copyright: The author, Meshari A. Al-Ebrahim, 2020.

    Download (11.91MB)

DOI (Published version): 10.23889/SUthesis.59728

Abstract

A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosi...

Full description

Published: Swansea 2020
Institution: Swansea University
Degree level: Doctoral
Degree name: Ph.D
Supervisor: Ransing, Rajesh
URI: https://cronfa.swan.ac.uk/Record/cronfa59728
Abstract: A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.
Keywords: Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network
College: Faculty of Science and Engineering