E-Thesis 526 views 257 downloads
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model / MESHARI EBRAHIM
Swansea University Author: MESHARI EBRAHIM
DOI (Published version): 10.23889/SUthesis.59728
Abstract
A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosi...
Published: |
Swansea
2020
|
---|---|
Institution: | Swansea University |
Degree level: | Doctoral |
Degree name: | Ph.D |
Supervisor: | Ransing, Rajesh |
URI: | https://cronfa.swan.ac.uk/Record/cronfa59728 |
first_indexed |
2022-03-29T08:26:04Z |
---|---|
last_indexed |
2022-03-30T03:27:45Z |
id |
cronfa59728 |
recordtype |
RisThesis |
fullrecord |
<?xml version="1.0"?><rfc1807><datestamp>2022-03-29T10:44:55.3737662</datestamp><bib-version>v2</bib-version><id>59728</id><entry>2022-03-29</entry><title>Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model</title><swanseaauthors><author><sid>fd17f92abca231a4dd2fc9f4cc3a0fa6</sid><firstname>MESHARI</firstname><surname>EBRAHIM</surname><name>MESHARI EBRAHIM</name><active>true</active><ethesisStudent>false</ethesisStudent></author></swanseaauthors><date>2022-03-29</date><abstract>A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets.</abstract><type>E-Thesis</type><journal/><volume/><journalNumber/><paginationStart/><paginationEnd/><publisher/><placeOfPublication>Swansea</placeOfPublication><isbnPrint/><isbnElectronic/><issnPrint/><issnElectronic/><keywords>Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network</keywords><publishedDay>29</publishedDay><publishedMonth>9</publishedMonth><publishedYear>2020</publishedYear><publishedDate>2020-09-29</publishedDate><doi>10.23889/SUthesis.59728</doi><url/><notes/><college>COLLEGE NANME</college><CollegeCode>COLLEGE CODE</CollegeCode><institution>Swansea University</institution><supervisor>Ransing, Rajesh</supervisor><degreelevel>Doctoral</degreelevel><degreename>Ph.D</degreename><apcterm/><lastEdited>2022-03-29T10:44:55.3737662</lastEdited><Created>2022-03-29T09:23:54.7798165</Created><path><level id="1">Faculty of Science and Engineering</level><level id="2">School of Engineering and Applied Sciences - Uncategorised</level></path><authors><author><firstname>MESHARI</firstname><surname>EBRAHIM</surname><order>1</order></author></authors><documents><document><filename>59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf</filename><originalFilename>Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf</originalFilename><uploaded>2022-03-29T09:29:41.3174630</uploaded><type>Output</type><contentLength>12491517</contentLength><contentType>application/pdf</contentType><version>E-Thesis – open access</version><cronfaStatus>true</cronfaStatus><documentNotes>Copyright: The author, Meshari A. Al-Ebrahim, 2020.</documentNotes><copyrightCorrect>true</copyrightCorrect><language>eng</language></document></documents><OutputDurs/></rfc1807> |
spelling |
2022-03-29T10:44:55.3737662 v2 59728 2022-03-29 Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model fd17f92abca231a4dd2fc9f4cc3a0fa6 MESHARI EBRAHIM MESHARI EBRAHIM true false 2022-03-29 A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets. E-Thesis Swansea Random Forest, Causal Knowledge, Statistical Process Control, Missing Data Imputation, Machine Learning, Decision Tree, Process Operating Range, Critical Process Factor, Odds Ratio, Cross Validation, Neural Network 29 9 2020 2020-09-29 10.23889/SUthesis.59728 COLLEGE NANME COLLEGE CODE Swansea University Ransing, Rajesh Doctoral Ph.D 2022-03-29T10:44:55.3737662 2022-03-29T09:23:54.7798165 Faculty of Science and Engineering School of Engineering and Applied Sciences - Uncategorised MESHARI EBRAHIM 1 59728__23708__0ee6fcedd066475d97bf6b30c5fd26ae.pdf Ebrahim_Meshari_PhD_Thesis_Final_Cronfa.pdf 2022-03-29T09:29:41.3174630 Output 12491517 application/pdf E-Thesis – open access true Copyright: The author, Meshari A. Al-Ebrahim, 2020. true eng |
title |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
spellingShingle |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model MESHARI EBRAHIM |
title_short |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
title_full |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
title_fullStr |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
title_full_unstemmed |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
title_sort |
Manufacturing Process Causal Knowledge Discovery using a Modified Random Forest-based Predictive Model |
author_id_str_mv |
fd17f92abca231a4dd2fc9f4cc3a0fa6 |
author_id_fullname_str_mv |
fd17f92abca231a4dd2fc9f4cc3a0fa6_***_MESHARI EBRAHIM |
author |
MESHARI EBRAHIM |
author2 |
MESHARI EBRAHIM |
format |
E-Thesis |
publishDate |
2020 |
institution |
Swansea University |
doi_str_mv |
10.23889/SUthesis.59728 |
college_str |
Faculty of Science and Engineering |
hierarchytype |
|
hierarchy_top_id |
facultyofscienceandengineering |
hierarchy_top_title |
Faculty of Science and Engineering |
hierarchy_parent_id |
facultyofscienceandengineering |
hierarchy_parent_title |
Faculty of Science and Engineering |
department_str |
School of Engineering and Applied Sciences - Uncategorised{{{_:::_}}}Faculty of Science and Engineering{{{_:::_}}}School of Engineering and Applied Sciences - Uncategorised |
document_store_str |
1 |
active_str |
0 |
description |
A Modified Random Forest algorithm (MRF)-based predictive model is proposed for use in man-ufacturing processes to estimate the e˙ects of several potential interventions, such as (i) altering the operating ranges of selected continuous process parameters within specified tolerance limits,(ii) choosing particular categories of discrete process parameters, or (iii) choosing combinations of both types of process parameters. The model introduces a non-linear approach to defining the most critical process inputs by scoring the contribution made by each process input to the process output prediction power. It uses this contribution to discover optimal operating ranges for the continuous process parameters and/or optimal categories for discrete process parameters. The set of values used for the process inputs was generated from operating ranges identified using a novel Decision Path Search (DPS) algorithm and Bootstrap sampling.The odds ratio is the ratio between the occurrence probabilities of desired and undesired process output values. The e˙ect of potential interventions, or of proposed confirmation trials, are quantified as posterior odds and used to calculate conditional probability distributions. The advantages of this approach are discussed in comparison to fitting these probability distributions to Bayesian Networks (BN).The proposed explainable data-driven predictive model is scalable to a large number of process factors with non-linear dependence on one or more process responses. It allows the discovery of data-driven process improvement opportunities that involve minimal interaction with domain expertise. An iterative Random Forest algorithm is proposed to predict the missing values for the mixed dataset (continuous and categorical process parameters). It is shown that the algorithm is robust even at high proportions of missing values in the dataset.The number of observations available in manufacturing process datasets is generally low, e.g. of a similar order of magnitude to the number of process parameters. Hence, Neural Network (NN)-based deep learning methods are generally not applicable, as these techniques require 50-100 times more observations than input factors (process parameters).The results are verified on a number of benchmark examples with datasets published in the lit-erature. The results demonstrate that the proposed method outperforms the comparison approaches in term of accuracy and causality, with linearity assumed. Furthermore, the computational cost is both far better and very feasible for heterogeneous datasets. |
published_date |
2020-09-29T20:10:47Z |
_version_ |
1821346991670558720 |
score |
11.04748 |