Zum Hauptinhalt springen

Assessment of quality predictions achieved with machine learning using established measurement process capability procedures in manufacturing.

Schorr, Sebastian ; Bähre, Dirk ; et al.
In: Technisches Messen, Jg. 89 (2022-04-01), Heft 4, S. 240-252
Online academicJournal

Assessment of quality predictions achieved with machine learning using established measurement process capability procedures in manufacturing  Bewertung von mit maschinellem Lernen erzielten Qualitätsprognosen durch die Anwendung von etablierten Verfahren zum Nachweis der Messprozessfähigkeit in der Fertigung 

The increasing amount of available process data from machining and other manufacturing processes together with machine learning methods provide new possibilities for quality control and condition monitoring. A prediction of the workpiece quality in an early machining stage can be used to alter current quality control strategies and could lead to savings in terms of time, cost and resources. However, most methods are tested under controlled lab conditions and few implementations in real manufacturing processes have been reported yet. The main reason for this slow uptake of this promising technology is the need to prove the capability of a machine learning method for quality prediction before it can be applied in serial production and supplement current quality control methods. This article introduces and compares approaches from the fields of machine learning and quality management in order to assess predictions. The comparison and adaption of the two approaches is carried out for an industrial use case at Bosch Rexroth AG where the diameter and the roundness of bores are predicted with machine learning based on process data.

Zusammenfassung: Die zunehmende Verfügbarkeit von Prozessdaten aus Fertigungsprozessen und die Zugänglichkeit zu Methoden des maschinellen Lernens eröffnet neue Möglichkeiten für die Qualitätskontrolle und die Zustandsüberwachung. Die Prognose der Qualität eines Werkstückes in einem frühen Bearbeitungsstadium kann zur Änderung der bisherigen Qualitätskontrollstrategien führen und zudem Einsparungen in Bezug auf Zeit, Kosten und Ressourcen hervorbringen. Die meisten Prognosemodelle werden zumeist ausschließlich unter kontrollierten Laborbedingungen getestet, sodass bisher nur wenige Implementierungen in reale Fertigungsprozesse erfolgten. Der Hauptgrund für diese langsame Integration dieser vielversprechenden Technologie in die Serienfertigung sowie die Ergänzung der bisherigen Qualitätskontrollstrategien ist die Notwendigkeit, die Fähigkeit einer Methode des maschinellen Lernens zur Qualitätsprognose nachzuweisen. Dieser Artikel stellt jeweils einen Ansatz aus den Bereichen maschinellen Lernens und Qualitätsmanagement vor, um die Genauigkeit einer Qualitätsprognose zu bewerten. Die Implementierung der beiden Ansätze erfolgt für einen industriellen Anwendungsfall bei der Bosch Rexroth AG, bei dem der Durchmesser und die Rundheit von Bohrungen mithilfe von maschinellem Lernen auf der Basis von Prozessdaten prognostiziert werden.

Keywords: Quality prediction; prediction assessment; machine learning; manufacturing; Qualitätsprognose; Prognosebewertung; maschinelles Lernen; Serienfertigung

1 Introduction

Through the integration of more and more sensors into machine tools as well as the accessibility of data from numerical controllers (NC) and programmable logical controllers (PLC), an increasing amount of process data for each workpiece is available [[1]], [[2]]. The rising availability of affordable database storage and computing power as well as increasing knowledge about machine learning allow manifold usage of the process data in the manufacturing industry [[3]]. One primary example is condition monitoring for predictive maintenance which is seen as one main promise for improving the efficiency and reducing cost in manufacturing [[4]]. Sensor data allow prediction of machine faults as well as monitoring of the sensors integrity themselves based on redundancy in networks sensor systems [[5]]. Current research also addresses questions of metrological base principles such as calibration, measurement uncertainties and thus traceability to the SI unit system for comparable and reproducible measurement results [[6]]. However, manufacturing companies are forced to search for new solutions and approaches to overcome challenges like shortened product life cycles, increased product variances, enhanced requirements on the product quality, and a CO2 neutral manufacturing. Machine learning (ML) is seen as a contributing solution for the mentioned challenges not only indirectly, but also directly for quality control during the manufacturing process itself [[7]]. Applying machine learning to achieve a quality prediction for each workpiece in an early machining stage can, for example, increase the product quality and reduce the number of scrap parts leading to more sustainable manufacturing. To establish a quality prediction based on machine learning and process data in manufacturing it is necessary to alter the current quality control strategy. The current quality control of a manufacturing process is for example done with a sample inspection [[8]]. This means that workpieces are measured with industrial metrology which leads to very precise measurements but is time and cost intensive. For switching to a 100 % quality control achieved with machine learning at minimal cost it is not sufficient to only determine suitable machine learning methods to achieve the quality predictions but also to assess the capability of the prediction process itself. No standard exists to assess the capability of a machine learning method for quality prediction in manufacturing. Therefore, this article describes two different approaches, one from the field of machine learning and one from quality management, to assess the predictions. Depending on the chosen approach the prediction accuracy, the systematic prediction error or the reproducibility is assessed. A quality prediction based on machine learning in the manufacturing process of hydraulic valves is set up and utilized in this article for the application, derivation, and comparison of the approaches to assess the capability of a machine learning method for quality prediction.

2 Quality prediction based on process data and machine learning

The use case considered in this research paper addresses the quality prediction of drilled and reamed bores of hydraulic valves at Bosch Rexroth AG in Homburg (Germany). The process data were obtained from a milling-machine in the serial production during the machining of the pre-casted valve housings made of gray cast iron. Hydraulic valves are characterized by narrow tolerances to allow seal-less fits and prevent oil leakage. Even slight quality deviations can cause high scrap rates and financial losses. Therefore, quality control close to real time at minimal or ideally no additional cost is preferred.

Graph: Figure 1 Manufacturing process of a hydraulic valve with current and future quality control strategy.

During the manufacturing process, each valve housing is first machined, followed by the assembly of the valve, and finally an end-of-line test is carried out as depicted in Figure 1. Sample inspection with industrial metrology is currently used to control the machining process. The sample inspection covers only a low percentage value of all valves but still results in high costs. In addition, the latency between the machining of a valve and the corresponding measurement results allows no direct feedback about the machining process and risks manufacturing waste housings until the measurement results are known. In addition, a considerable amount of resources, time, and money is lost if a valve is only detected as a waste part during the end-of-line testing. To increase the transparency of the machining process and to determine the quality of the housings close to real time, a quality prediction based on process data and machine learning methods is pursued. The aim for the data collection is to use the process data (e. g., torque, current, and speed of the machining center) already available in an NC without the requirement for integration of any additional sensors. This would also allow a fast transfer of the technology to further machining centers and reduced maintenance cost. Hence, a feasible and cost-effective in-process quality control solution for machining processes under industrial conditions could be achieved.

To establish the in-process quality prediction, a gateway, and an industry PC were integrated into the control cabinet of the machining center (GROB; G500). The gateway is required to connect the industry PC with the drive controllers of the spindle and the z-axis as well as the PLC (programmable logic controller). The drive controllers gather the actual torque values with a high frequency and create data packages, which are sent to the gateway via the OPC UA interface. The data packages are forwarded to the industry PC and are stored in a database (MongoDB). The software Python is used to calculate features from the raw data and to predict the workpiece quality with machine learning methods. Depending on the chosen feature extraction strategy [[9]] a quality prediction can be achieved within a few seconds after the machining of a bore is completed. Hence, a quality control can be established in the machining stage and already for the next bore the machining could be adjusted to achieve a better quality. The data preparation and the determination of the most suitable machine learning methods to predict the diameter and the roundness of a drilled bore have been described previously [[7]], [[9]], [[10]].

Comprehensive studies [[11]] have shown that precise diameter predictions are achieved with the machine learning method gradient boosting regressor (GBR) together with the torque measurements from the spindle and z-axis as well as the condition of the cutting tool (wear). For the prediction of the roundness the machine learning method random forest regressor (RFR) performs best but beside the torque measurements from the spindle and the z-axis and the tool condition also the position measurements of x- and y-axis as well as the speed measurement from the spindle are required. Further data sources (e. g. ambient conditions) were not taken into account because of the constant conditions in the serial production, which were verified with temperature measurements of the cooling lubricant and the machine bed. Furthermore, surveillance of the material hardness of the raw material showed constant values, so this was also not considered as input data.

The applied machine learning methods belong to the ensemble methods [[12]]. Both RFR and GBR consist of an ensemble of decision trees, where each single tree is built using a randomly selected sample (bootstrap sample) of the original training dataset. The final prediction is obtained by averaging the predictions of the whole ensemble of trees. The main difference between both methods is that the trees are "grown" simultaneously for RFR and successively for GBR. The most suitable machine learning method and its hyperparameters are determined by a grid search and a 10-fold cross validation for each quality characteristic. The machine learning methods are trained with 80 % of the 450 data sets and tested with the remaining 20 %. For the diameter prediction (GBR) the determined number of trees is 217 and for the roundness prediction (RFR) 426 trees lead to the best prediction result [[11]].

The diameter of a bore is characterized by its dimensional tolerance which describes the allowed deviation from the nominal diameter. The diameters of drilled bores of a production batch are usually normally distributed because of the tool wear. In Figure 2 (a) the diameter values of the training data have the shape of a bimodal distribution. The reason for this is the unequal apportionment of the collected training data which furthermore belong to different batches. Adding the validation data to the training data set would result in a distribution curve which is more similar to a normal distribution curve.

The roundness specifies how perfectly the circular cross-section of a bore is, i. e. the difference in diameter between the outer and the inner circle which envelop the circumferential line of a bore. Thus, an ideal bore has a roundness of zero and the roundness cannot be negative. Hence, the corresponding measured values accumulate to the right of the zero point which leads to a folded normal distribution. In Figure 2 (b) the measured roundness values are depicted and the maximum is observed at 0.9 µm and not at 0 µm. Thus, for this machining operation a systematic roundness error exists resulting in a density function very similar to a normal distribution (especially for the validation data). Therefore statistics are used throughout the analysis which are applicable to normally distributed data.

The main focus in this article is on the determination of suitable approaches to assess the quality predictions in manufacturing. These can be achieved using common performance metrics to evaluate predictions or to apply measurement process capability procedures established in manufacturing.

Graph: Figure 2 Distribution of the diameter and roundness predictions as well as the training and validation data sets. [ [11] ].

3 Assessment of machine learning predictions with common performance metrics

In order to assess the prediction accuracy of a trained algorithm, it is necessary to have interpretable performance metrics. These performance metrics make it possible to evaluate the suitability of an algorithm for a use case and to compare the algorithms with one another. The calculation of these performance metrics is performed for a test data set. Various performance metrics can be used to assess the prediction accuracy. The number of predictions, the prediction values, and the actual value (ground truth) are required to calculate the performance metrics. Suitable performance metrics are for example the mean absolute error (MAE), the maximum error (MAX) and the coefficient of determination (one minus the residual sum of squares divided by the total sum of squares) (R2). These performance metrics are often used to assess the prediction accuracy of machine learning methods in case of a regression problem. To achieve a valid result, it is necessary to calculate performance metrics for as many combinations of training and test data sets as possible. So-called cross validation [[13]] is suitable for this because for each run different data samples are used for training and testing. The final performance metric value is then the average of the performance metric values of all runs [[14]]. Due to the context of this work (prediction of workpiece quality), however, assessment methods are also conceivable that originate from quality management and are usually used to evaluate the capability of the measuring equipment.

4 Procedures to assess the capability of measurement processes in manufacturing

Capability studies are necessary to ensure that a measuring device can determine a quality characteristic with sufficiently small uncertainty (measurement deviation and measurement value scatter) with regard to the feature tolerance [[15]]. There are different approaches to assess the capability of a measuring process. The approaches developed for practical use are based on the procedure described in the "Guide to Expression of Uncertainty in Measurement" (GUM) [[16]], which is very comprehensive and precise, but also very complex. In manufacturing industry, the procedures that are published in Volume 5 "Measurement and inspection processes, capability, planning and management" [[17]] by the German Association of the Automotive Industry or in the "Measurement System Analysis" (MSA) guide of the US-based Automotive Industry Action Group [[18]] are established. In addition, company guidelines are established, such as booklet 10 "Capability of measurement and test processes" published by the Bosch Group [[19]]. Each of these guidelines aims to calculate a GRR value (Gage Repeatability and Reproducibility). The assessment of the capability of a measurement process is finally carried out by comparing the determined GRR value with specified limit values. The mentioned procedures and company guidelines represent standard procedures with which the capability of a large number of measurement processes can be determined. Proof of capability must also be carried out for measurement processes that represent special cases. For this purpose, the standard procedures can also be used in an adapted form or provide suggestions for a possible procedure for assessing the capability [[20]].

The procedures used in the Bosch Group for assessing the capability of measurement and test processes (booklet 10) [[19]] are applied in this paper. These procedures are also used outside of the Bosch Group and are described by Keferstein et al. [[15]] and Dietrich et al. [[20]]. The booklet describes a total of five different procedures with which an assessment of the measurement process can be carried out. To assess the capability of the machine learning methods for quality prediction, procedure one (systematic measurement error and repeatability) and procedure three (repeatability and reproducibility without operator influence) are sufficient. Both procedures serve to assess the measurement process capability of continuous characteristics. Standards or workpieces from the production are necessary, which are measured several times. In addition, a normal distribution of the measured values is assumed. Finally, the measuring equipment used for quality assessment must have a resolution that is less than or equal to 5 % of the tolerance of the characteristic to be tested.

4.1 Systematic measurement error and repeatability

Procedure one assesses the capability of a measuring process in terms of location and variation of the measured values within the tolerance field of the quality characteristic. The method is carried out with a standard, which is measured 50 times. It must be ensured that all working steps between the individual measurements of the measurement series are completely carried out. That means that the measurement standard must be removed from the clamping and re-inserted before each measurement. If there is no standard, a calibrated workpiece can also be used. Procedure one requires quality characteristics with two-sided specification limits, i. e., with a lower and an upper limiting value (LSL and USL), so that the tolerance T is defined as the difference between LSL and USL. The reference value of the standard xm should be in the middle of the tolerance if possible. The systematic measurement error results from the difference between the mean value xg of the measured values and the reference value xm . The standard deviation sg of the measured values is a measure of the repeatability of the measurement. The capability of the measuring equipment is assessed by calculating the capability indices Cg and Cgk , which must have a value greater than or equal to 1.33 for the measuring process to be classified as capable [[15]], [[19]], [[20]].

Potential capability index:

Graph

Cg=0,2T6sg (1)

Critical capability index:

Graph

Cgk=0,1T|xgxm|3sg (2)

4.2 Repeatability and reproducibility (GRR) without operator influence

If procedure one has been successfully completed, procedure three can be used to assess the capability of a measurement process in terms of its variation behavior using measurements of workpieces from series production. It is assumed that the operator has no influences on the measurement process. In contrast to procedure one, procedure three includes possible interactions between the measurement process and the measuring object in the capability study. It concerns the influence of the production part variation on the measurement as well as the influence of the measurement on the behavior of the production parts. For the implementation at least 25 workpieces from series production are required, which are randomly selected and whose characteristic values are within the tolerance. The selected serial parts are measured in random order in at least two measurement series under repeatability conditions. The aim is to determine the total variation %GRR (gage repeatability and reproducibility) of the measurement process. The capability of the measurement process is finally determined based on defined limit values for %GRR:

  • – %GRR ≤ 10 % measurement process is capable,
  • – 10 % < %GRR ≤ 30 % measurement process is conditionally capable,
  • – 30 % < %GRR measurement process is not capable.

The reference value for %GRR is the tolerance T of the measured characteristic and the GRR value is the same as the equipment variation value [[14]], [[17]], [[18]]:

Graph

%GRR=6GRRT100% (3)

5 Assessment of achieved quality predictions in practice

The predictions obtained are assessed in two ways. On the one hand, the performance metrics introduced in Section 3 are calculated from the prediction results, which are common for evaluating the prediction accuracy of machine learning methods. On the other hand, the procedures used in metrology to assess the capability of a measurement device are applied (cf. Section 4).

5.1 Assessment with common performance metrics in machine learning

Section 3 described performance metrics that are often used to assess prediction accuracy. They are also suitable for assessing the prediction accuracy that are achieved using machine learning methods. In addition, the performance metrics are set in relation to the tolerance of each quality characteristic. This allows to determine a kind of "safety factor" which indicates whether a characteristic is still within its tolerance limits or not. The observed distribution of the measured and the predicted values for the diameter and the roundness of the bores are shown in Figure 2 (a) and (b). The blue distribution curve and histogram depict the training data set, the validation data is shown in green, and the predictions in red. It can be seen that the predicted diameters do not exceed the value range of the validation data set. The training is based on the data represented by the blue distribution curve, but the predictions are only within the diameter range of the green curve. This is a very positive result as it indicates that the trained machine learning method correctly determines the actual diameter values (value range). As a result of the minimal convergence of the diameter predictions to the arithmetic mean of the measured values, the density distribution of the predictions increases somewhat compared to the actual value. The diameter predictions for the individual bores are characterized by high accuracy, so that the maximum error (MAX) and the mean absolute error (MAE) are only 0.374 µm and 0.126 µm, respectively. Comparing these values with the tolerance of the diameter (10 µm), the MAE is 1/79th and the MAX is 1/26th of the tolerance. Thus, based on the predictions, the method allows to determine whether a bore is within the required tolerance with a high degree of reliability (safety factor). Furthermore, the R2 is 0.77 which allows to predict the diameter based on the recorded process data. The good correlation between the predicted and the measured values can be seen in Figure 2 (c).

The predictions for the roundness of the bores are centered in the middle of the distribution of the validation data set and do not cover the full range (Figure 2 (b)). The predictions are concentrated at approx. 0.86 µm, regardless of the measured roundness of a bore. The MAE and MAX are 0.038 µm and 0.08 µm, respectively, again allowing to a high degree of certainty when determining whether a bore is within the tolerance (MAE and MAX are 1/65th and 1/30th of the tolerance, respectively). However, a roundness prediction for an individual bore is not possible. Only the range in which the roundness values lie can be reliably predicted. The R2 is negative (−0.1) which shows how unprecise the predictions are and that the mean value would be a better prediction than the values obtained from the model. This can also be derived from Figure 2 (d). Hence, the information obtained from the data sources is not sufficient for an individual roundness prediction and should be supplemented by further data sources.

5.2 Adaptation of established measurement process capability procedures

In addition to the assessment of the prediction results with common performance metrics, procedures are used that are established in the manufacturing industry for assessing the capability of measuring equipment and measuring processes. By using these standardized, widely accepted procedures well known to production engineers, it is possible to express the prediction accuracy with common and comparable indices. Hence, the results from machine learning and industrial metrology can be compared with one another.

The basic principle of procedure one (cf. Section 4.1) is to measure a standard with a measuring device 50 times to determine the capability indices Cg and Cgk from the measurement results. For the conventional implementation of procedure one, a measuring device and a standard with a known reference value are required. In the context of the quality prediction, the measuring device is replaced by a trained machine learning method and the standard is replaced by the gathered process data from each workpiece and the measured values of the quality characteristic. Replacement of the measuring equipment by the trained method is easy to understand as the measurement is ultimately to be replaced by a prediction. The use of a workpiece as a standard requires a more detailed derivation. In principle, a workpiece can also be used as a standard if it has been calibrated appropriately. Here, the idea is to "calibrate" a workpiece from the validation data set and to determine the reference value of the quality characteristic in order to be able to compare the achieved prediction with this reference value. It has to be considered, that the machine learning method not only requires the process data of many workpieces for training but also the associated measured values of the respective quality characteristic. Thus, it is not sufficient to have only one calibrated workpiece, instead all workpieces contained in the training data set have to be measured or "calibrated" and then the determined reference values are used for the training. The calibration of one workpiece could still be implemented in terms of effort but the calibration of several hundred workpieces is not affordable. In industrial practice, the training must therefore take place with measured values that do not represent reference values. In order to still obtain measured values that are close to the actual value of the quality characteristic, capable measurement processes are essential. The basis for the training is therefore not only reliably gathered process data, but also precisely measured quality characteristics. The fact that no reference values can be used for training should have a negligible effect from a statistical point of view due to the high number of training data. According to the law of large numbers, values with probabilities (measured values) converge with increasing number against the arithmetic mean and according to the central limit theorem of statistics the means of samples can be considered as normally distributed [[21]]. Both statements are applicable here, as the measured values are from production batches (random samples) which are relatively homogeneous in quality and thus the true value (of the production batch) is determined on average by the measuring equipment. A machine learning method that is trained with data whose values deviate from the true values will inevitably learn incorrect relationships. A type of error propagation, which leads to errors which are significantly larger than the original errors in the training data, is not to be expected due to the aforementioned principles.

Graph: Figure 3 Workflow of a capability assessment according to procedure one for measured values and its adaption for predicted values [ [11] ].

Figure 3 shows the steps for carrying out procedure one for conventional and adapted use. For the common application of procedure one, it can be assumed that there is a ready-to-use measuring device with which a standard can be measured 50 times and then the capability indices are calculated. When carrying out procedure one, to prove the capability of a machine learning method for quality prediction, a distinction can be made as to whether the training of the method is done for each run individually or not. In practice, an ML method is trained once and then used for prediction until a new training is required. If a trained method is used to provide predictions 50 times for the same data set, 50 identical prediction results are achieved (branch "A" in Figure 3). The fact that a trained method returns the same prediction result for the same input data on each run means that the standard deviation of the predictions is zero. Measuring a workpiece several times leads to slightly different measurement results, but this is not the case for a trained machine learning method. As a result, the repeatability of the prediction result examined with procedure one is excellent because the same result is always achieved. To calculate the capability indices, the respective numerator is divided by a multiple of the standard deviation, which in this case is not possible from a mathematical point of view because the standard deviation is zero. Assuming that the standard deviation is not zero, but almost zero, the quotient would be infinitely large and therefore always above the limit values of the capability indices, which would be a positive proof of capability.

To be able to assess the capability of a machine learning method for quality prediction with procedure one, an adaptation of procedure one must therefore be made. Such an adaptation or modification is quite common but requires that it is documented and critically questioned [[19]]. Procedure one is to be modified in such a way that it includes a new training of the method with each run of the prediction (branch "B" in Figure 3). The machine learning method learns the same relationships between input data and target variable with each training, but the method-internal parameterization is individual after each training. This means that the general configuration of the used machine learning method determined during the grid search stays constant (e. g., number of trees) but that the training procedure is done several times. For procedure one the training is carried out 50 times, which means that 50 times bootstrapped samples are obtained from the training data set to train the method. The training is carried out 50 times, so that 50 times the (number of) trees of the method are developed. Each tree will be individual because of its freedom for example to choose the extracted features to split a node or to choose the number of nodes and leaves to achieve a precise prediction. This leads to a certain spread or "uncertainty" of the prediction results. The lower the spread, the more valid the relationships recognized and learned by the method in the training data. Consequently, the spread (standard deviation) is an essential variable when calculating the capability indices. When performing the measuring process capability with measuring equipment and standard, it is assumed that all handling steps are carried out in full for each run. This means that the standard must be unclamped and re-clamped for each measurement. Such a source of error or influence on the measurement result does not exist for a prediction process. Instead, the training of the method should be understood as a crucial handling step and should be carried out with each run. To carry out procedure one, a standard is measured a total of 50 times, resulting in 50 measured values (1×50 matrix). For the implementation of the adapted procedure the predictions achieved for a single workpiece are sufficient. However, it is advisable to predict 50 times a value for each of the m workpieces of the validation data set, so that at the end a m×50 matrix of predictions is obtained. These values are then used to calculate the capability indices. However, one of the m workpieces is sufficient for calculating the indices and it is therefore advisable to select the workpiece with the largest standard deviation of the 50 predictions as worst-case scenario. A high standard deviation leads to smaller values of the capability indices, which would result in a negative proof of capability if the capability limit value of 1.33 is not reached. If the quality prediction is capable for the workpiece with the largest standard deviation, then the applied machine learning method would also achieve a positive proof of capability for the remaining validation data. The choice of the largest standard deviation for the proof of capability can be viewed as a kind of "to be on the safe side". This approach makes it unnecessary to calibrate a workpiece for the proof of capability. If changes are made to the manufacturing process that require fundamental adjustments of the prediction method, an efficient and quick proof of capability can be carried out with each newly collected validation data set.

Graph: Figure 4 Workflow of a capability assessment according to procedure three for measured values and its adaption for predicted values [ [11] ].

If a measuring process is classified as capable according to procedure one, then procedure three must be used to check its capability for measuring workpieces from series production. For procedure three at least 25 workpieces from series production are measured (predicted) twice and then the capability index is determined. The sequence of the two procedures is basically identical as depicted in Figure 4. Procedure three must be adapted also because identical predictions are achieved for a data set if the same trained method is used for each run. The adaptation again consists in carrying out the training of the method for each run (branch "D" in Figure 4). The predictions obtained in this way have a certain spread, which can be understood as analogous to the uncertainty of a measuring device and enable the use of the formulas to determine the capability. When performing the two series of measurements, the workpieces must be measured in a different order for each series of measurements and must therefore be clamped and unclamped twice. The influences on the measurement result from the measuring equipment and the handling of the workpieces do not occur for the ML-based quality prediction. These uncertainties can only be considered through the renewed training of the method (similar to the approach described for procedure one), since they occur when measuring the quality characteristics of the workpieces in the training data set and are now expressed by the scatter of the predicted values. A selection of the predictions, which are taken into account for the calculation of the capability indices does not have to be carried out because at least 25 predictions are required. Thus, all predictions of the two series are used to determine the capability.

5.3 Application of adapted methods for assessment of quality predictions

The software Q-DAS solara.MP from Hexagon is used for the assessment of the prediction series and the calculation of the capability indices. The assessment of the capability of a machine learning method for quality prediction is carried out for each quality characteristic. In this chapter the results and obtained values for the capability indices are discussed.

Graph: Figure 5 Results of procedure one and three for the prediction of the diameter [ [11] ].

The results of procedure one and three for the prediction of the diameter are shown in Figure 5. The Cg and Cgk values of procedure one are 2.75 and 2.24, respectively, and are well above the limit value of 1.33. This means that the selected machine learning method (here: GBR) is capable of predicting the diameter. The left diagram in Figure 5 shows the results of the 50 repetition predictions for the same bore (workpiece). It is the bore with the highest standard deviation sg of the prediction results. The value for sg is 0.12 µm resulting in a Cg value of 2.75 for a diameter tolerance of 10 µm. The mean value xg of the predictions is 17.9942 mm and is therefore 0.2 µm above the measured diameter xm=17.994mm of the bore. The resulting Cgk value is 2.24 which is again above the required limit value. The predictions partly have the exact value of xm but are mostly above this value. Performing a so-called one-sample t-test with a confidence level of 90 % leads to the result that this deviation is significant. In such a case, AIAG MSA advises either to make a correction to the measuring device, which in this case would be a new training with further training data or additional features, or to take a correction factor into account for each measured value [[18]]. However, this is not considered as being necessary here because the deviation is only 0.2 µm on average and it cannot be assured that the training data have a similar spread. In addition, the measured value xm could also deviate from the actual value by this size. In addition, the exact value xm is achieved for seven of the 50 repetition predictions. Furthermore, xg is significantly closer to xm than to xm+0.1T . Thus, from a manufacturing point of view, there is no clear offset that would justify taking measures. A further argument for not taking any measures comes from a statistical point of view. The predictions made by the 50 models are highly statistically dependent because the 50 models are highly related as they are fitted on very similar bootstrapped data which originate from the training data set. Therefore, the standard deviation from 50 predictions of one bore does not have the same meaning as the standard deviation of 50 independent repeated measurements. Due to the high degree of dependence of the 50 predictions, the assumption of the one-sample t-test is violated. As a result, the standard deviation is underestimated which leads to a bias towards type I error (false positive). As an alternative, a paired t-test is used to compare the xg values to the corresponding reference values xm for the validation data set of 50 bores. Applying a paired t-test with a confidence level of 90 % leads to the result that the deviations of the diameter predictions are not significant. This means, that the xg values are very close to the corresponding reference values and that no measures are required. Hence, the commonly used one-sample t-test in procedure one should be replaced by a paired t-test if the values are predicted and not measured.

The right diagram of Figure 5 shows the results of procedure three. The green horizontal line in Figure 5 can be seen as the mean of the two predictions for each sample which is set to zero to better compare the prediction deviation of all samples. The blue and pink curves stand for the first and second prediction, respectively. The difference between the blue and the pink curve of a sample is the total difference between the two prediction values (delta). Depending on which of the two predictions for a sample is higher or smaller it will be half of delta above or below the green line. The %GRR value of 3.98 % is well below the critical limit value of 10 % confirming the capability of the prediction method. The spread of the results of the ML method is therefore within an acceptable range. The diameters are predicted twice for each of the 50 bores. The deviations from the mean of the two predictions are shown for each bore. The maximum deviation is only 0.1 µm and therefore significantly smaller than the limit value, which is 5 % of the bore tolerance. The resolution achievable with the machine learning method is equal to the resolution of the measuring device (coordinate measuring machine) with which the quality data of the training data set was determined. The resolution %RE is 1 % of the tolerance and thus well below the maximum allowed 5 %. Hence, as all necessary capability indices ( Cg , Cgk , %GRR, %RE) are within the required limits, the machine learning method, here GBR, is capable to predict the diameter of a bore in this use case.

Graph: Figure 6 Results of procedure one and three for the prediction of the roundness [ [11] ].

For the prediction of the roundness, the machine learning method RFR is used, for which the capability of quality prediction must also be assessed. The diagrams in Figure 6 show that the method RFR can be regarded as capable to predict the roundness in this use case. The standard deviation sg of the 50 repeated predictions is only 0.019 μm, resulting in a high Cg value of 4.34. The difference between measured value xm and the average of the predictions xg is extremely small (0.02 μm), indicating a low systematic prediction deviation, which finally results in a Cgk value of exactly 4.0. Both values are above the limit value of 1.33 and thus the prediction method can be classified as capable in terms of repeatability and systematic prediction error (procedure one). The systematic prediction deviation is classified as significant according to the one-sample t-test with a confidence level of 90 %, but from a manufacturing point of view again no action is required. That measures are definitely not required is proven by applying a paired t-test with a confidence level of 90 %, which shows that no significant deviation exists.

In addition, the resolution %RE of the predictions (0.04 %) is far below the maximum limit value of 5 %. Furthermore, the %GRR value of 2.34 % is well below the limit value of 10 %, which confirms the capability of the prediction method. Moreover, a low spread or uncertainty of the prediction values is evident from the right diagram in Figure 6. Only for the first 10 bores the repeated predictions deviate somewhat more from one another. Nevertheless, the roundness prediction in this use case can be seen as capable according to procedures one and three.

6 Conclusion

The accuracy and the capability of the diameter and roundness predictions were assessed with two different approaches. First, performance metrics and charts commonly used to evaluate the prediction accuracy of machine learning methods were applied. Second, procedures established in quality management to determine the capability of measurement processes were used. Both approaches had to be adapted slightly to be used to assess the quality predictions. The performance metrics were set in relation to the tolerances of the quality characteristics to better assess the prediction accuracy. The workflows of the procedures for capability assessment for measurement equipment had to be adapted to calculate the required capability indices from predictions instead of measurements as normally applied.

The performance metrics of the first approach are simple to calculate and show the prediction accuracy but do not consider the deviation and the distribution of the data. A probability density chart is necessary to evaluate the distribution of the predicted and the actual values. These performance metrics are suitable to check the general performance of a prediction method but are not sufficient to assess the capability of a method in terms of quality management. The procedures and the corresponding indices to assess the capability of measurement processes are well known and established in quality management but are more difficult to calculate (second approach). Important information like deviation, repeatability, and systematic prediction errors are determined and tolerances are considered but the patterns of the distributions from the measured values (validation data) and the predicted values are untested. A prediction method can be capable based on the indices, while the distributions of measured and predicted values differ considerably. This is seen for example for the prediction of the roundness. Hence, the procedures, particularly one and three, are suitable to determine the capability of a prediction method for quality prediction in terms of quality management aspects but have to be supplemented by a comparison of the distribution curves.

In conclusion, with the advent of quality prediction a further development of quality management begins in which the existing approaches and methods are not necessarily replaced but supplemented by machine learning methods. This is evident from the nature of quality management in which almost all decisions are based on extensive data acquisition and analysis.

References 1 A. Schütze, N. Helwig, T. Schneider, "Sensors 4.0 – smart sensors and measurement technology enable Industry 4.0," J. Sens. Sens. Syst. 7, 2018, 359–371, doi: 10.5194/jsss-7-359-2018. 2 B. Denkena, M. Dittrich, F. Uhlich, "Self-optimizing cutting process using learning process models," Procedia Technology 26, 2016, 221–226, doi: 10.1016/j.protcy.2016.08.030. 3 D. Weichert, P. Link, A. Stol, S. Rüping, S. Ihlenfeldt, S. Wrobel, "A review of machine learning for the optimization of production processes," The International Journal of Advanced Manufacturing Technology 104, 2019, 1889–1902, doi: 10.1007/s00170-019-03988-5. 4 L. Wang, X. V. Wang, "Condition Monitoring for Predictive Maintenance," in: "Cloud-Based Cyber-Physical Systems in Manufacturing," Springer, Cham, 2017, 163–192, doi: 10.1007/978-3-319-67693-7_7. 5 T. Schneider, S. Klein, A. Schütze, "Machine learning in industrial measurement technology for detection of known and unknown faults of equipment and sensors," tm – Technisches Messen 86 (11), 2019, 706–718, doi: 10.1515/teme-2019-0086. 6 S. Eichstädt, B. Ludwig, "Metrology for heterogeneous sensor networks in the IoT," tm – Technisches Messen 86 (11), 2019, 623–629, doi: 10.1515/teme-2019-0073. 7 S. Schorr, M. Möller, J. Heib, D. Bähre, "Quality Prediction of Drilled and Reamed Bores Based on Torque Measurements and the Machine Learning Method of Random Forest," Procedia Manufacturing 48, 2020, 894–901, doi: 10.1016/j.promfg.2020.05.127. 8 T. Pfeifer, Quality Management. Strategies, Methods, Techniques, 3rd edition. Hanser, München, 2002. 9 S. Schorr, M. Möller, J. Heib, D. Bähre, "In-process Quality Control of Drilled and Reamed Bores using NC-Internal Signals and Machine Learning Method," Procedia CIRP 93, 2020, 1328–1333, doi: 10.1016/j.procir.2020.03.020. S. Schorr, M. Möller, J. Heib, D. Bähre, "Comparison of Machine Learning Methods for Quality Prediction of Drilled and Reamed Bores Based on NC-Internal Signals," Procedia CIRP 101, 2021, 77–80, doi: 10.1016/j.procir.2020.09.190. S. Schorr, "Prozessparallele Prognose der Werkstückqualität mithilfe von NC-internen Daten und maschinellem Lernen", PhD thesis, Saarland University, 2021, doi: 10.22028/D291-34543. O. Sagi, L. Rokach, "Ensemble learning: A survey," WIREs Data Mining and Knowledge Discovery 8 (4), 2018, doi: 10.1002/widm.1249. C. Schaffer, "Selecting a classification method by cross-validation," Machine Learning 13, 1993, 135–143, doi: 10.1007/BF00993106. J. Cleve, U. Lämmel, Data mining, De Gruyter Studium, Berlin, 2016. C. P. Keferstein, M. Marxer, C. Bach, Fertigungsmesstechnik. Alles zu Messunsicherheit, konventioneller Messtechnik und Multisensorik, 9. Auflage. Springer Vieweg, Wiesbaden, 2017. Joint Committee for Guides in Metrology, "JCGM 100: Evaluation of measurement data – Guide to the expression of uncertainty in measurement," JCGM100:2008, 2008. Available online: https://www.bipm.org/en/committees/jc/jcgm/publications. German Association of the Automotive Industry (VDA), "Quality Management in the Automotive Industry, Measurement and inspection processes, capability, planning and management", 3rd edition, vol. 5, 2021. AIAG Core Tools, "Measurement Systems Analysis (MSA)," 4th edition, 2010. J. Tilsch, "Quality Management in the Bosch Group, 10. Capability of Measurement and Test Processes," 2019, available online. E. Dietrich, Messmanagementsystem/Prüfmittelmanagement. in: T. Pfeifer, R. Schmitt, Masing Handbuch Qualitätsmanagement, 6. Auflage. Hanser, München 2014, 714–729. J. Tilsch, "Quality Management in the Bosch Group, 9. Machine and Process Capability," 5th edition, 2019, available online.

By Sebastian Schorr; Dirk Bähre and Andreas Schütze

Reported by Author; Author; Author

Sebastian Schorr studied Industrial Engineering at RWTH Aachen University and Tsinghua University and received his Master of Science degrees in 2017. From 2018 until 2021 he did his PhD at Saarland University in cooperation with Bosch Rexroth AG. Since 2021 he is working at Bosch Rexroth AG as an engineer for machine learning and quality management.

Dirk Bähre studied mechanical engineering and completed his PhD in the field of cutting technologies at the Technical University of Kaiserslautern 1994. After holding management positions in research at the TU Kaiserslautern and in process development at a large automotive supplier, he is holding the Chair of Production Engineering at Saarland University since 2008. Since 2021, he is a scientific director at the Centre for Mechatronics and Automation Technology ZeMA in Saarbrücken. He researches and teaches in the field of manufacturing techniques for industrial applications. His research focus is on precise machining technologies, the analysis of machining effects on material properties and resource efficiency as well as sustainability in production.

Andreas Schütze received his diploma in physics from RWTH Aachen in 1990 and his doctorate in Applied Physics from Justus-Liebig-Universität in Gießen in 1994 with a thesis on microsensors and sensor systems for the detection of reducing and oxidizing gases. From 1994 until 1998 he worked for VDI/VDE-IT, Teltow, Germany, mainly in the fields of microsystems technology. From 1998 until 2000 he was professor for Sensors and Microsystem Technology at the University of Applied Sciences in Krefeld, Germany. Since April 2000 he is professor for Measurement Technology in the Department Systems Engineering at Saarland University, Saarbrücken, Germany and head of the Laboratory for Measurement Technology (LMT). His research interests include smart gas sensor systems as well as data engineering methods for industrial applications.

Titel:
Assessment of quality predictions achieved with machine learning using established measurement process capability procedures in manufacturing.
Autor/in / Beteiligte Person: Schorr, Sebastian ; Bähre, Dirk ; Schütze, Andreas
Link:
Zeitschrift: Technisches Messen, Jg. 89 (2022-04-01), Heft 4, S. 240-252
Veröffentlichung: 2022
Medientyp: academicJournal
ISSN: 0171-8096 (print)
DOI: 10.1515/teme-2021-0125
Schlagwort:
  • PROCESS capability
  • ROBERT Bosch GmbH
  • MACHINE learning
  • FORECASTING
  • TOTAL quality management
  • ORGANIZATIONAL learning
  • MANUFACTURING processes
  • QUALITY control
  • Subjects: PROCESS capability ROBERT Bosch GmbH MACHINE learning FORECASTING TOTAL quality management ORGANIZATIONAL learning MANUFACTURING processes QUALITY control
  • machine learning
  • manufacturing
  • prediction assessment
  • Quality prediction
  • maschinelles Lernen
  • Prognosebewertung
  • Qualitätsprognose
  • Serienfertigung Language of Keywords: English; German
Sonstiges:
  • Nachgewiesen in: DACH Information
  • Sprachen: English
  • Alternate Title: Bewertung von mit maschinellem Lernen erzielten Qualitätsprognosen durch die Anwendung von etablierten Verfahren zum Nachweis der Messprozessfähigkeit in der Fertigung.
  • Document Type: Article
  • Author Affiliations: 1 = Bosch Rexroth AG, Bexbacher Straße 72, 66424 Homburg, Germany ; 2 = Universität des Saarlandes, Lehrstuhl für Fertigungstechnik LFT, 66123 Saarbrücken, Germany ; 3 = Universität des Saarlandes, Lehrstuhl für Messtechnik LMT, 66123 Saarbrücken, Germany

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

oder
oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

oder
oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

xs 0 - 576
sm 576 - 768
md 768 - 992
lg 992 - 1200
xl 1200 - 1366
xxl 1366 -