Einzeltreffer — DigiBib

Nowadays machine learning methods and data-driven models have been used widely in different fields including computer vision, biomedicine, and condition monitoring. However, these models show performance degradation when meeting real-life situations. Domain or dataset shift or out-of-distribution (OOD) prediction is mentioned as the reason for this problem. Especially in industrial condition monitoring, it is not clear when we should be concerned about domain shift and which methods are more robust against this problem. In this paper prediction results are compared for a conventional machine learning workflow based on feature extraction, selection, and classification/regression (FESC/R) and deep neural networks on two publicly available industrial datasets. We show that it is possible to visualize the possible shift in domain using feature extraction and principal component analysis. Also, experimental competition shows that the cross-domain validated results of FESC/R are comparable to the reported state-of-the-art methods. Finally, we show that the results for simple randomly selected validation sets do not correctly represent the model performance in real-world applications. [ABSTRACT FROM AUTHOR]

Zusammenfassung: Machine Learning und datenbasierte Modelle sind in der Literatur zu Computer Vision, Biomedizin oder Zustandsüberwachung weit verbreitet. Allerdings zeigen diese Methoden oft Schwächen in der realen Anwendung. Domain Shift oder Vorhersagen außerhalb der Verteilung der Trainingsdaten werden häufig als Ursache benannt. Besonders bei industrieller Zustandsüberwachung ist unklar, wann diese Probleme auftreten und welche Algorithmen robust dagegen sind. In diesem Beitrag werden die Ergebnisse einer klassischen ML-Auswertekette bestehend aus Merkmalsextraktion, Merkmalsselektion und Klassifikation bzw. Regression (FESC/R) mit jenen von mehrschichtigen neuronalen Netzen auf zwei öffentlich verfügbaren Datensätzen verglichen. Es wird gezeigt, dass mögliche Datenverschiebungen mittels Merkmalsextraktion und Hauptkomponentenanalyse sichtbar gemacht werden können. Weiterhin wird gezeigt, dass die mit FESC/R auf Domain Shift Problemen erreichten Ergebnisse gleichwertig zu denen von mehrschichtigen neuronalen Netzen sind. Letztlich wird gezeigt, dass eine zufällige Kreuzvalidierung die in einer realen Anwendung zu erwartende Genauigkeit eines ML-Modells nicht hinreichend abbilden kann. [ABSTRACT FROM AUTHOR]

Comparison of different ML methods concerning prediction quality, domain adaptation and robustness Vergleich verschiedener ML-Methoden bezüglich Vorhersagequalität, Domänenanpassung und Robustheit

Nowadays machine learning methods and data-driven models have been used widely in different fields including computer vision, biomedicine, and condition monitoring. However, these models show performance degradation when meeting real-life situations. Domain or dataset shift or out-of-distribution (OOD) prediction is mentioned as the reason for this problem. Especially in industrial condition monitoring, it is not clear when we should be concerned about domain shift and which methods are more robust against this problem. In this paper prediction results are compared for a conventional machine learning workflow based on feature extraction, selection, and classification/regression (FESC/R) and deep neural networks on two publicly available industrial datasets. We show that it is possible to visualize the possible shift in domain using feature extraction and principal component analysis. Also, experimental competition shows that the cross-domain validated results of FESC/R are comparable to the reported state-of-the-art methods. Finally, we show that the results for simple randomly selected validation sets do not correctly represent the model performance in real-world applications.

Zusammenfassung: Machine Learning und datenbasierte Modelle sind in der Literatur zu Computer Vision, Biomedizin oder Zustandsüberwachung weit verbreitet. Allerdings zeigen diese Methoden oft Schwächen in der realen Anwendung. Domain Shift oder Vorhersagen außerhalb der Verteilung der Trainingsdaten werden häufig als Ursache benannt. Besonders bei industrieller Zustandsüberwachung ist unklar, wann diese Probleme auftreten und welche Algorithmen robust dagegen sind. In diesem Beitrag werden die Ergebnisse einer klassischen ML-Auswertekette bestehend aus Merkmalsextraktion, Merkmalsselektion und Klassifikation bzw. Regression (FESC/R) mit jenen von mehrschichtigen neuronalen Netzen auf zwei öffentlich verfügbaren Datensätzen verglichen. Es wird gezeigt, dass mögliche Datenverschiebungen mittels Merkmalsextraktion und Hauptkomponentenanalyse sichtbar gemacht werden können. Weiterhin wird gezeigt, dass die mit FESC/R auf Domain Shift Problemen erreichten Ergebnisse gleichwertig zu denen von mehrschichtigen neuronalen Netzen sind. Letztlich wird gezeigt, dass eine zufällige Kreuzvalidierung die in einer realen Anwendung zu erwartende Genauigkeit eines ML-Modells nicht hinreichend abbilden kann.

Keywords: Machine learning; condition monitoring; domain adaptation; neural network; maschinelles Lernen; Zustandsüberwachung; Domänenadaption; neuronale Netze

1 Introduction

Condition monitoring and predictive maintenance are important applications for machine learning (ML) algorithms. Input data in these applications comes from different industrial sensors, e. g., pressure, temperature, vibration, or microphones. Targets for these tasks are usually predicting fault types, remaining useful lifetime (RUL), or detecting anomalies. Detecting faults or anticipating upcoming failures can significantly reduce downtime of industrial systems and furthermore ensure the quality of products [[1]], therefore more and more companies start to invest in predictive maintenance systems that are more applicable within the framework of Industry 4.0 [[2]].

The performance of modern data-driven models depends on the quality and quantity of supplied observations, however achieving proper data that covers all possible variations of a system and its environment to train these models is costly. A proper design of experiment (DoE) should include different control conditions and multiple recordings of a single target in different process situations and environments, e. g., for a ball bearing and an attached vibration sensor all possible combinations of temperatures, load and speed levels, lubrication conditions, vibrations transmitted by other machinery and peculiarity of production tolerances. This is exacerbated further when taking outdoor applications into account, e. g., for hydraulic machinery, because of the wider temperature range and additional environmental factors. Usually, variables considered less important for a process or expensive to change are ignored or varied in a limited range or step size to limit experiment costs. Either control variables are discrete or continuous, a design of experiment can cover just a limited number of them and respectively subsets of the complete target space are available for training [[3]]. However, generalizing a model among these subsets is difficult because the control conditions and the environment can change the distribution of data and may result in an OOD problem and domain shifts [[4]].

Many real-life applications of ML for condition monitoring impose domain shift problems onto the algorithms and thereby decrease its performance. Supervised ML methods mostly rely on the assumption that both training and test data come from the same distribution. This distribution of data can be called a domain and ideally, there is only one domain in a supervised learning task [[5]]. As mentioned, in industrial applications it is highly likely that working conditions affect the data distribution. For instance, operating temperature, oil or air pressure, rotating speed are common operating conditions that can cause a significant shift in the data distribution. Consequently, usually in real-world scenarios we encounter OOD problems, where the source domain is different from the target domain.

In classical measurement science, changes in the environment (computer science: domain shifts) are tackled with calibration and adjustment of the measurement system which is also possible for machine learning algorithms. To perform adjustment of ML algorithms different algorithms and approaches are proposed. The work of Moreno et al. [[5]] is one of the first attempts to unify the concepts and nomenclature in this field, because before that many works had been published about the same concept but with inconsistent naming [[6]], [[7]], [[8]], [[9]], [[10]]. Former studies include multi-task learning [[11]], instance weighting [[12]], visual domain transformation [[13]], maximum mean discrepancy [[12]] while over the past decade most of the works have been based on deep learning models [[14]], [[15]], [[16]], [[17]], [[18]], [[19]]. In the field of industrial datasets, some of the recent works in this field are conditional maximum mean discrepancy-based ANN [[20]], the virtual adversarial training and batch nuclear-norm maximization [[21]], adaptive batch normalization for networks with wide first-layer kernels [[22]], and multi-kernel maximum mean discrepancies in multiple layers [[17]]. A common element among the recent methods that have been proposed for the diagnosis of industrial applications under domain shift is using deep learning architectures while the comparison with conventional ML is missing. In this study we compare an approach using feature extraction, selection, and classification/regression (FESC/R) with ANNs in terms of robustness against the cross-domain shift using two publicly available datasets and study the effect of calibration and adjustment as a domain adaptation technique on the defined tasks.

The rest of the paper is structured as follows: Section 2 first introduces a dataset from a hydraulic machine representing a regression problem and a dataset on damage detection in a ball bearing representing a classification task. Both datasets comprise domain shifts that are visualized. Furthermore, Section 2 introduces the two ML approaches compared in this study, i. e., a more classical approach based on feature extraction, feature selection and classification/regression and a more modern approach based on neural network architecture search. Section 3 shows how classification and regression results are affected by domain shifts in the mentioned datasets and how calibration and adjustment can help to compensate those effects before the study is concluded in Section 4.

2 Material and methods

In this section, we introduce datasets and methods that are used in this study. Methods consist of ANNs and FESC/R which is based on conventional ML approaches. The two publicly available datasets are (1) a hydraulic system (HS) dataset from Center for Mechatronics and Automation Technology (ZeMA gGmbH) and (2) a bearing dataset from Case Western Reserve University (CWRU).

2.1 Datasets

2.1.1 ZeMA hydraulic system dataset

The first dataset used in this study is the recorded behavior of an HS where multiple common faults of such a system are simulated in a testbed [[23]]. This is a publicly available dataset [[24]] and includes the recording of 17 sensors over typical operating cycles with 60 seconds duration. Sensors measure process values like pressures, temperatures, and volume flows. Sampling frequencies for the sensors range from 1 to 100 Hz resulting in observations with 60 to 6000 data samples per sensor and cycle. The faults simulated in the ZeMA dataset are decreasing cooler performance, the main valve switching performance, internal pump leakage, and accumulator pre-charge pressure reduction. All fault conditions could be independently set by the control system. Figure 1 (a) shows conditions of the cooler, valve, pump, and accumulator in the dataset; Figure 1 (b) shows the systematic variation for valve, pump, and accumulator in more detail with these cycles being repeated three times for three cooler performances, 100 %, 20 %, and 3 %, respectively, which could also represent different climatic conditions or domains. We choose this dataset because changing the process conditions may lead to dataset shifts which is the main topic in this study. The training and validation scenarios are designed in a way that the final model should be robust against the cooler performance. We divided the data into training and test groups according to the cooler states. The training data includes cooler states of 100 and 20 percent, and test data is when the cooler worked just at 3 percent performance. The valve condition is considered as the target for the regression task.

Graph: Figure 1 DoE of ZeMA dataset. All possible combinations of faults were repeated three times (a) for different cooler states, 100 % (normal operation), 20 %, and 3 % performance (set by varying the duty cycle of the cooler ventilator). The combination of faults for valve, pump, and accumulator is plotted in figure (b).

Graph: Figure 2 Features from the ZeMA dataset after PCA. (a) all data samples are colored by the cooler performance. (b) subset of observations that have more similarities. Shifts in the distribution due to the cooler changes are visible.

The control variable with the biggest influence on the sensor data is the performance of the cooler. To show the influence of the process conditions on the distribution of data, we extracted statistical features of raw data using StatMom which is described in Section 2.2.1. Then, Principal Component Analysis (PCA) was applied on the extracted features, the results for the first two components are shown in Figure 2. As is evident from Figure 2, the cooler has a major influence on the data distribution and a change of the cooler performance results in a shift along the first principal component (PC), indicating the main source of variance in the dataset. Consequently, for this task the observations that belong to each cooler state can be considered as separate domains. Additionally, the cooler state is the most expensive control variable to change because after each change the machine has to run for several hours before a new temperature equilibrium is reached, and conditions are stable again [[23]]. Because the cooler state influences the machine's temperature and thereby the oil's viscosity its impact is evident in all measured sensor signals.

The learning scenario chosen for this dataset is the assessment of the current valve switching characteristic from 72 % (barely working) to 100 % under the condition that only data from cooler state 20 % (equivalent to 55 °C average temperature) and 100 % (equivalent to 44 °C average temperature) are used for training. Correctly predicting the valve characteristic at cooler state 3 % (equivalent to 66 °C average temperature) [[23]] would prove the model to be robust against environmental changes of temperature and is therefore the chosen ML task for the evaluated algorithms.

For calibration and adjustment of the models, data recorded at 3 % cooler state (new domain) and 100 % correct valve operation was considered. This is equivalent to using few measurements from a new machine (valve at 100 %) in a different environment for calibration and adjustment. The model is then evaluated on all data at cooler state 3 %.

Table 1 Summary of CWRU dataset.

Fault types	Fault size (mil)	Load (hp)	Rotational Speed (rpm)	Sensor Orientation
No Damage	0	0, 1, 2, 3	1725–1796	12
Inner Ring	7, 14, 21	0, 1, 2, 3	1721–1796	12
Outer Ring	7, 14, 21	0, 1, 2, 3	1723–1796	3, 6, 12
Ball	7, 14, 21	0, 1, 2, 3	1721–1796	12

2.1.2 CWRU bearing dataset

The second dataset that is used in this study was published by the Bearing Data Center of Case Western Reserve University (CWRU) [[25]]. CWRU is a publicly available dataset that is frequently used in condition monitoring publications [[22]], [[26]]. Recorded data are vibration signals from the fan and drive ends of the testbed, the data is available in both 12 kHz and 48 kHz sampling rates. Four health states of the system are recorded, three different fault types and a healthy state without defects, the faults being damages at the inner and outer ring of the bearings and at the balls. Additionally, the process conditions were changed during the experiments; these process conditions are the rotational speed, fault diameters (0, 7, 14, and 21 mil corresponding to 0, 180, 360, and 540 µm) and motor load (0, 1, 2, and 3 hp corresponding to 0, 0.75, 1.5 and 2.25 kW). Note that we keep the original imperial units in the following instead of converting to SI units to avoid confusion when comparing our results with other evaluations for this widely used data set. A summary of the dataset is presented in Table 1. Although CWRU is extremely popular in the condition monitoring community, most of these studies have been done in different scenarios of the selected target and validation approach [[26]]. Predicting all combinations of fault types and fault sizes (10 classes) is a common scenario among the published studies, however the number of observations for each class is limited. We designed the scenario to cover the generalizability of the models on different load conditions by choosing the fault types as the classification target. In this study, the 48 kHz sampling version of recordings is used, and original recordings are cut into equal chunks with a length of 24k to have a constant number of data points in each observation.

To demonstrate domain shifts and domain adaptation in classification tasks, the learning scenario was chosen to be the detection of fault type (vs. fault severity). The four groups to be detected are damage at the outer ring (OR), inner ring (IR), ball (B) and no damage (None). In a real world application this detection should be possible independent of the load. Therefore, the training data was chosen to be the data recorded at 1, 2, and 3 hp load. The test data is the data recorded at 0 hp load respectively.

As in the ZeMA dataset we extracted features from the dataset, the result of a PCA performed on the extracted features is presented in Figure 3. In contrast to the ZeMA dataset, it is expected that the most relevant features come from the frequency domain of the vibration sensor. Therefore, a Time Frequency Extractor (TFEx, Section 2.2.1) was used for this use-case. Figure 3a shows the PCA plot colored to indicate different loads of the motor and Figure 3b visualizes the same data by coloring according to the damage target for the defined scenario. The healthy state, highlighted with an ellipse in both figures, shows a shift of the data for the motor at zero hp, which can cause difficulties for a model trained only on the other load conditions (1, 2, 3 hp). As this is the most obvious influence of process conditions on the data, it was chosen to be the test set in this scenario.

Graph: Figure 3 Features from the CWRU dataset after PCA. Visualizing the features based on the motor loads (a). Visualizing the features based on the damage types (b).

2.2 Algorithms

Various data-driven models have been applied in condition monitoring and predictive maintenance, including linear discriminant analysis (LDA) [[27]], support vector machines (SVM) [[28]], artificial neural networks (ANN) [[29]]. To make a comparison between methods, in this study two types of ML models are used. The first one is representing a more classical ML approach based on multivariate statistics using an automated approach to benchmark and choose the best algorithms (see Section 2.2.1) and the second one is targeting the increasingly popular deep learning models utilizing neural network architecture search for end-to-end learning (see Section 2.2.2).

2.2.1 FESC/R

Conventional ML methods have been used for a long time [[30]] and are still popular and effective in industrial applications [[31]]. One of the advantages of conventional ML models over neural networks is their explainability and interpretability. It means that decisions and predictions of a model can be explained in a human understandable manner, as a result the model would be more reliable and trustable [[32]], [[33]]. Therefore, conventional ML is preferred in fields where the model safety is critical and fault diagnosis is needed, like industrial and medical applications. On the other hand, these methods include explicit feature engineering and sometimes also have problems nowadays in handling big datasets.

We can formulate the conventional ML methods in form of a pipeline that consists of feature extraction (FE), feature selection, and classification (FESC) or regression (FESR). Depending on the model and input dimensions, it is also possible to apply a classifier/regression directly on the raw data, but in general FE methods are needed to reduce the dimensionality of the data. FE methods are usually necessary for condition monitoring applications because the raw data can be high-dimensional inputs [[34]], i. e., vibration signals from a bearing or current signals from a motor, and it would be difficult to find the relevant and key features directly from the raw data. These methods can extract general features independent of the use-case and observations or engineered features that are explicitly designed for a specific task. Deciding to use engineered features or using general FE methods depends on the resources, complexity of the task, and resource constraints of the use-case during training. For wide application, methods not requiring a trained data scientist are preferable, thus we will focus on general FE methods here which allow automation of the ML method adaptation [[27]]. Similarly, feature selection (FS) works for the same purpose to further reduce the data dimensionality, but usually makes use of supervised methods, while general FE is based on unsupervised methods. The goal is to reduce the number of features, decrease the complexity and improve the performance of the model, especially to avoid overfitting.

In this study an open-source MATLAB toolbox [[34]] for conventional ML models was used, and FEFSC/R is mainly based on this publicly available toolbox. As described in [[27]] a fixed structure of methods is utilized to handle ML tasks. Although the overall structure is fixed, one of several different, mutually complementing algorithms is used for feature extraction and selection, respectively. This toolbox builds a stack from the predefined methods and searches through different combinations of methods as well as number of features to select the best-suited stack for the target task. As finding the best algorithms and hyperparameters (HP) in this framework is automatic, one of the main limitations of conventional machine learning, explicit feature engineering, is resolved. This framework shows a reliable performance in diverse applications ranging from industrial fault detection, remaining useful life (RUL) estimation, classification of human movement patterns to gas sensor systems [[34]], [[35]], [[36]], [[37]], therefore this method was selected to study its behavior in inter-domain problems.

Here the focus is on showing these methods characteristics in OOD problems and the goal is measuring the robustness of the models in an OOD scenario. The toolbox is used to search for the best methods and HPs for both datasets then from the results the following methods are selected. The first FE function is called StatMom [[27]], which extracts the first four statistical moments (mean, variance, skewness, and kurtosis) from the input signal over defined time intervals. StatMom is simple and reliable, therefore it is our first candidate to reflect the general trend of a dataset. However, relying just on the time domain features is not sufficient in many use cases where the main information is contained in the frequency domain. Therefore, the second FE used in this study is the Time Frequency Extractor (TFEx) which extracts features from both time and frequency domains. TFEx extracts the root mean square (RMS), variance, linear slope, maximum, position of maximum, skewness, kurtosis, and peak to RMS ratio values from sections of both the time and frequency representation of input signals.

Table 2 Features used in TFEx and StatMom.

TFEx (time and frequency domain)	StatMom (time domain)
RMS	Mean
Variance	Variance
Skewness	Skewness
Kurtosis	Kurtosis
Position of maximum
Linear slope
Maximum
Peak to RMS ratio

As FS we used two methods, namely Relieff [[38]] for the classification task and Pearson correlation for the regression task. After ranking features by the mentioned methods, a search for the best number of features to maximize the prediction accuracy is performed [[27]], then the selected features are transferred to the final block, i. e., classification or regression, of the training stack. Finally, the last element of the stack in this study applies a classification or regression method. The classification method in this study combines LDA and Mahalanobis distance (MD) to group mean [[27]] as classifier. LDA (also called Fisher linear discriminant analysis) [[39]] is a supervised ML approach that finds discriminant functions (DF) that maximize inter-class variance and minimize the intra-class variances. DFs of the LDA method are linear combinations of features and the best solution is guaranteed under the assumptions of identical class covariances and Gaussian distribution of the classes. MD is a simple but effective metric that measures the distance between a point and destination in a multi-dimensional space. Basically, MD is the Euclidean distance after transforming variables to remove the correlation and have unit variance, therefore it is not sensitive to the dimensions and units of data. For regression partial least squares regression (PLSR) [[40]] is used. The number of components for PLSR is chosen by an exhaustive search between one to the maximum number of features that feed to the PLSR method, the selection criterium is the best performance, i. e., lowest error.

2.2.2 Deep learning methods

ANNs with three or more layers are called deep neural networks (DNN) therefore many modern network architectures are classified as deep learning methods. Over the past decade deep learning algorithms have been used in various applications and achieved outstanding results [[41]], [[42]], [[43]]. Researchers and developers utilize these methods in almost every purpose, i. e., autonomous driving, medical diagnosis, recommendation systems, translation, and predictive maintenance [[26]]. Conventional ML algorithms either have limited capacity or, when applied to big datasets, face difficulties during the training process. In contrast, ANNs are scalable and can be trained efficiently on big datasets with the help of the simple backpropagation algorithm. On the other hand, ANNs usually are used as black-boxes and explaining their predictions is difficult.

Designing and training DNNs requires tuning many hyper-parameters (HP). Hyper-parameters in ANNs can be categorized into two groups, the first one contains architecture HPs and the second learning HPs. Architecture HPs are parameters that specify the structure of a network, i. e., number of layers, filter size, number of filters in a layer, number of neurons in a layer, and of course the type of a layer. Training HPs specify the training process for a network when the architecture is fixed. Initial learning rate, mini-batch size, and number of epochs are examples for the training HPs. The process of choosing the best HPs is generally called HP optimization and more specifically for architecture HPs is named Neural Architecture Search (NAS) [[44]].

Table 3 List of HPs for CNN, including the search ranges. An iterative HP optimization approach is used and the ranges for the initial and final ranges reported.

HP	Initial trial	Final trial
Initial learning rate (log scale)	10⁻⁴–10⁻²	0.002
Kernel size	2–10	3–5
Depth	3–10 (Conv blocks)	5–10
# of neurons, fully connected layer	1–1000	1–100
# of filters	Fixed, relative to the depth	Fixed, relative to the depth
1^st convolutional layer filter size	10–100	10–35
Batch Size	32	32

Table 4 List of HPs for WaveNet-based network, including the search ranges. An iterative HP optimization approach is used and the ranges for the initial and final ranges reported.

HP	Initial trial	Final trial
Initial learning rate (log scale)	10⁻⁴–10⁻¹	10⁻³–10⁻²
Kernel size	2–10	3–6
Depth	3–10 (Conv blocks)	3–10 (WaveNet blocks)
# of filters	8–100	40–80
1^st conv layer filter size	20–100	20–50

Although NAS showed particularly superior results outperforming human designed networks [[45]], most researchers in condition monitoring and predictive maintenance are using modified versions of published DNNs [[29]]. A systematic approach to report an experiment is to describe not only the methods, model, and architecture but also the HP optimization process that is used for the task. Otherwise, an unforeseen overfitting might be neglected because of non-linearity of DNNs, over-parametrization and the assumption that training, and test data come from the same distributions. In this study Multilayer Perceptron (MLP), Convolutional Neural Network (CNN), Resnet [[46]] and WaveNet-style [[47]] networks were used in the NAS process for both defined tasks (ZeMA and CWRU datasets), however, we only report the architectures that achieved good validation accuracy for each scenario. In case of comparable error rates, the method with lower complexity is selected. NAS is a remarkably interesting field of study that is actively growing [[45]] and explaining its algorithms and methods are out of context of this study.

We used evolutional parametric architectures together with Bayesian optimization [[48]] as the NAS process [[37]]. This algorithm iteratively searches for the best HPs and in each trial the ranges for the parameters are adjusted according to the best results of the last trial and possible constraints, i. e., maximum depth or maximum number of filters. In the end, the network corresponding to the lowest validation loss is selected. The parametric architectures and the ranges of the parameters are explained in Tables 3 (for CNN) and 4 (WaveNet); Table 5 lists the HPs which were kept fixed during the experiments. These HP ranges were selected based on a work that used NAS to find a high-performance ANN for a similar dataset [[37]] and the original work of the WaveNet [[47]].

Table 5 List of fixed HPs during experiments.

HP	CNN	WaveNet-based
Batch Size	32	32
L2-regularization	0.001	0.001
Learn rate drop rate	0.9	0.9
Maximum epochs	100	10
Optimizer	ADAM	ADAM

Two types of CNNs are used in this study: conventional CNN with a single forward path and a WaveNet-style [[47]] DNN using dilated convolutions [[49]] and skipped connections [[46]]. The simple CNN is used for the ZeMA dataset where the information is mostly contained in the time domain [[23]]. CNNs are the building block of many modern DNNs, moreover after multi-layer perceptron (MLP) networks CNNs are the simplest conventional network architecture. Because of the extremely high capacity of MLPs, they are very prone to overfitting [[50]]. Therefore, MLPs are not chosen in this study and CNNs are the next simplest network architecture that is selected. While it is possible to also apply CNNs on complex high-frequency signals, the WaveNet-style network is used for the CWRU dataset, where the input is raw vibration signals, to add diversity to our study. Two losses according to the use cases are used, the mean squared error (MSE) for the regression task and the cross-entropy loss for the classification scenario.

2.2.3 Domain adaptation

Table 6 Domain adaptation compared with similar approaches.

	Source and target tasks	Source and target domains (joint distribution)	Access to target domain
Supervised Learning	Same	Same	–
Transfer Learning	Same/Different	Same/Different	Yes
Domain Adaptation	Same	Different	Yes, unlabeled, or limited labels
Domain Generalization	Same	Different	No

As mentioned before, many ML approaches suffer from a degradation of the performance in real world scenarios due to a shift between training and test data [[51]]. Many algorithms have been developed to remedy this problem which can be categorized into different groups based on the training scenarios i. e., transfer learning, domain adaptation, and domain generalization. The idea is inspired by humans' approach to learning new tasks [[52]]. Although the idea of using transfer learning in ANNs training was first presented already in 1976 [[53]], it is actively used nowadays in deep learning applications. Transfer learning aims to build new knowledge based on a trained model which was trained in a different task or domain, e. g., applying the same form of feature extraction to train models for RUL detection of ball bearing in different sizes. Domain adaptation is a sub-category of transfer learning, where the task for source and target models are the same, but the domain is changing. An engineering example would be the detection of faults in a hydraulic system that is trained on a testbed and then transferred to an identical machine used in a different environment for which only few calibration data are available for the undamaged machine. Finally, the last member of this group is domain generalization, i. e., the ideal case where the trained model is not sensitive to the domains but relies only on features common to all domains. In the mentioned example of the hydraulic system this would require the model to be trained with data from many identical machines in different environments to identify features that are independent of the individual machine and environment. On the other hand, this model could be transferred to any additional machine without requiring further adaptation. Table 6 summarizes the characteristics of domain adaptation and similar methods.

Note that domain adaptation in ML is equivalent to the calibration and adjustment of conventional measurement systems. Both for ML methods and conventional measurements the deviation between the system output and a known target in few calibration measurements is used to adjust the output accordingly. This is typically done after a change in the environment (domain change) of the sensor system. Because both application examples shown in this paper can be interpreted as domain adaptation tasks the rest of this paper will focus on domain adaptation.

3 Experiments and results

In this section the results of evaluations for FESR/FESC and DNN models are reported side by side to allow easier comparison.

3.1 ZeMA hydraulic system, regression use-case

Although the target and other variables in this dataset are discrete numbers (due to restrictions concerning DoE), they represent continuous variables, and a model should generalize over the complete ranges. The published dataset [[24]] has 17 different sensors, for simplicity we use the pressure sensor at the main valve (PS1 in the published dataset) that is corresponding to the defined target, the sensor has 100 Hz sampling rate.

Graph: Figure 4 Prediction results for ZeMA dataset, linear lines show a fitted function on the training (red points) and test (blue points) predictions, to have a better visual representation a jitter plot is used. (a) Results from trained FESR stack. (b) Results from a trained CNN which is selected based on the validation loss.

3.1.1 The effect of domain shift on the trained models

In the earlier sections we illustrated the domain shift in the ZeMA dataset at the feature level. In this section we show the effect of this phenomenon when we train a model under this condition. The results are from two families of algorithms, FESR and deep learning models.

Graph: Figure 5 The output of the NAS algorithm for the ZeMA dataset (a). A convolutional block in this network consists of a convolution layer, a batch normalization layer and a ReLU layer (b).

Starting with the FESR model, we trained a stack of selected methods for the defined task as described in Section 2.1.1. For the selected stack the FE method is StatMom, the FS is the Pearson correlation method, and finally PLSR is the last method of the stack. The results of predictions for the training and test data are plotted in Figure 4a. As there are just four discrete values in the targets a scatter plot with jittering is used to provide a better view of overlapping data points – otherwise all samples would occur in four vertical lines and would be less distinguishable. Although the slope of the fitted linear lines for train and test data are similar, there is a clear shift between them. The change in temperature causes an offset error of approx. 2 %. This is equivalent to a conventional sensor system that suffers from a small cross-sensitivity to temperature. The root mean square error (RMSE) increases from 1.53 % (validation data) to 2.45 % (test data). The reason for this variation is the shifts of the distributions which was visualized in Figure 2; as the algorithms are not aware of the distribution of the test data, the shifts are not compensated. In the following we compare the results of a trained deep network for the same task.

Graph: Figure 6 Final trial of the NAS algorithm. In this plot the validation data are 20 % of the training set which were randomly selected. The test data is from a different distribution, i. e., a different operating temperature. Each point is a trained network, (a) ZeMA use case, (b) CWRU use case.

Alternatively, we searched for a DNN architecture to fulfill the same task. The selected DNN is a 9-layer CNN as the outcome of the NAS algorithm with the architecture and parameters as reported in Figure 5 and Table 7, respectively, with the HPs ranges for the first and last trials of the search algorithm given in Table 3. The final ranges for the parameters are values that led to the best networks (with lowest validation losses) in earlier trials. Figure 6 shows the final trial of the NAS progress, each point in the plots is a trained model with the color representing the iteration number of the model from blue to yellow. Since the objective function of this process is the validation loss, the architecture corresponding to the lowest value was selected as the final model. However, the test RMSE of the resulting model is not as low as the validation RMSE, with validation and test errors of 1.15 % and 9.75 %, respectively. To explain why the trained network generalized so poorly on the test data, the predictions of the network for both training and test data are visualized in Figure 4b, also allowing direct comparison to the FESR model.

Table 7 Summary of parameters of the selected CNN after performing the NAS.

Layers	Filter Size (H × W)	Number of filters	Stride
Conv Block 1	1 × 20	8	1 × 3
Conv Block 2	1 × 4	8	1 × 2
Conv Block 3	1 × 4	16	1 × 2
Conv Block 4	1 × 4	24	1 × 2
Conv Block 5	1 × 4	32	1 × 2
Conv Block 6	1 × 4	40	1 × 2
Conv Block 7	1 × 4	48	1 × 2
Conv Block 8	1 × 4	56	1 × 2
Conv Block 9	1 × 4	64	1 × 2
Fully Conn 1	81	–	–
Dropout 50 %	–	–	–
Fully Conn 2	1	–	–

The deviation between the features of the source and target domains leads to a shift in the final predictions. While the selected network performs accurately on the validation data which are selected from the training distribution, it has difficulty in generalizing to the test data. As is evident in Figure 4b, the test data are divided into two groups, with one having a slight shift only from the training data but the second group being significantly shifted away leading to approx. 10 % error for the predictions. These two groups are visible also at the feature level in Figure 2a, where the test data consists of two separate groups. To allow a better visual representation of this problem, prediction results of the test data are plotted explicitly in Figure 7. The slope of the fitted line for both groups is almost identical but there is a clear offset between the two groups. Note that this problem would not be visible if a simple random choice of test and training data had been used instead. Therefore, the validation scenario must be designed precisely to ensure covering cross-domain situations.

Graph: Figure 7 Predictions on the test dataset by the CNN. The "test group 1" are observations with a low error similar to the training data, while the "test group 2" are data with a significant shift regarding the target and thus high error.

Graph: Figure 8 The FESR model predictions after recalibration (a). The CNN predictions after recalibration (b).

3.1.2 Domain adaptation

As shown in the last section, shifts in the dataset can significantly degrade the performance of a trained model on test data, especially if these represent a different domain. To reduce this problem and improve the results, calibration and adjustments are required. Calibration is performed using the test data of a single class (here: observations with 100 % performance) to simulate the real-world application of the previously trained model to a new machine that is working at 100 % but in an environment with a different temperature. As the simplest form of adjustment, the measured offset is removed in post-processing. Figure 8 shows the results after recalibration for both tested models, quantitative results are reported in Table 8. Recalibration for the ANN model is done just for the second test group (in Figure 7) that had a dominant shift with regard to the training data. While the results for both models improve with domain adaptation, FESR clearly yields a superior result with a test RMSE of 1.58 which is almost as low as the validation RMSE, while the RMSE of the CNN, although reduced by a factor 3, is still almost twice as high at 3.34.

Table 8 Error rates for ZeMA dataset before and after recalibration.

Model	Validation RMSE	Test RMSE	Test RMSE after recalibration
FESR	1.53	2.45	1.58
CNN	1.15	9.74	3.34

Graph: Figure 9 LDA projection of the features in the CWRU use case (a). PCA plot of the embedded features from the network in the last convolution layer, the graph shows the first and second principal components (PCs) of the features (b).

Graph: Figure 10 The WaveNet-based model with the lowest validation loss in the NAS algorithm (a). The WaveNet block (b) and a convolution block (c) that are used in the architecture.

3.2 CWRU, classification use-case

3.2.1 The effect of domain shift on the trained models

In the same way as for the HS use case, we first chose a stack of FESC that works best for this task. As mentioned above the FE method is TFEx (cf. Section 2.2.1), with Relieff used for FS and finally LDA and Mahalanobis distance for classification. The test accuracy for the test data is 99 %, which is exceptionally good. To check if the model compensated the shift for the test set, we visualize the projected features after the LDA. Figure 9a shows the results of the projection, which shows a small shift between training and test data for the damaged samples, but a significant shift for the healthy state (damage type "None"). However, the projections of those observations are still sufficiently far away from the other groups to be classified correctly. Also, it should be noted that the shifts are not in the same direction for all target groups, due at least in part to the fact that the targets are categorical and can therefore not be sorted in a logical order.

As mentioned above we expect relevant features also from the frequency domain for this use case, therefore a network architecture that previously showed superior results for raw audio and vibration signals, WaveNet, is used. An HPs search for the WaveNet-based network in accordance with Table 4 was conducted and resulted in the network shown in Figure 10 with HPs as described in Table 9. Similar to the earlier use case the validation accuracy of many networks is 100 % but selecting a network that generalizes well to the test set is challenging and still an open question [[54]]. We can examine the network performance by visualizing the embedded features after the global pooling layer. The first two PCs of the embedded features are illustrated in Figure 9b, which clearly shows that features of the test data are significantly shifted from the training data for all target classes.

Table 9 Summary of parameters used for the WaveNet-based network.

Layers	Filter Size (H × W)	Number of filters	Stride	Dilation Factor
Conv Block 1	1 × 50	80	1 × 3	1 × 1
WaveNet Block	1 × 5	80	1 × 1	5^{(BlockNumber-1)}
Conv Block 2	1 × 4	80	1 × 2	1 × 1
Pooling 1	1 × 4	–	1 × 4	–
Conv Block 3	1 × 8	80	1 × 1	1 × 1
Conv Block 4	1 × 8	80	1 × 1	1 × 1
Pooling 2	1 × 8	–	1 × 8	–
Final Conv	1 × 1	4	1 × 1	1 × 1

3.2.2 Domain adaptation

Graph: Figure 11 LDA projection of the features in the CWRU use case after recalibration (a). Embedded features from the network in the last convolution layer after recalibration, the graph is the first and second PCs of the features (b).

Although the test accuracy of the trained FESC stack is almost perfect (98.8±0.8 %), we still use calibration and adjustment to compare the results. For this use case shifts from the target groups are different for each individual class therefore using a single class to calibrate the test set is not sufficient. This is evident in Figure 9a; if we move the test data for the healthy state to the mean value of the training set and then apply the same distance for other classes, it increases the observed shifts for the other classes considerably. One solution is applying standardization using a small portion of the test set from all classes. Thus, 20 % of test data from each class was used for this form of calibration and adjustment. The labels of the recalibration data are not needed. Figure 11 shows the results after standardizing the training and test data for both the FESC stack and the ANN model. Quantitative results are presented in Table 10; because of the stochastic evaluation procedure, the mean and standard deviations of 10 different runs are reported. Similar to the HS use case, a significant improvement is achieved for both ML approaches with the proposed domain adaptation, however, the performance of the FESC approach for the domain shift is significantly better than for the deep network. Furthermore, it had proved to be more robust to the domain shift even before domain adaptation, i. e., might be considered as domain generalization.

Table 10 Accuracy the models for CWRU dataset, before and after recalibration.

Model	Validation Accuracy %	Test Accuracy %	Test Accuracy % after recalibration
FESC	100	98.8	99.7 ± 0.3
WaveNet-style	100	81	92.5 ± 0.5

4 Conclusion and future works

In this paper DNNs were compared with conventional methods based on feature extraction and selection in scenarios with distribution shifts caused by changing ambient or experimental conditions. By visualizing the data at different levels, it was shown how shifts from raw data can propagate to a model and cause shifts in the predictions. As shifts in the data distribution are inevitable in many real-life scenarios, this issue needs to be considered when building a comprehensive ML model, i. e., in the model selection, validation scenario, training process adaptation. In the presented scenarios the conventional FESC/R approaches show better results compared to the ANN solutions. Although finding a DNN to correctly predict the training data is not difficult using NAS algorithms selecting a network that generalizes to the test data is highly challenging in a cross-domain situation. We also presented two simple domain adaptation techniques to improve the results of trained models. This showed that domain adaptation can be formulated as recalibration especially for regression use-cases achieving good results for both ML approaches, but again with significant advantages for the conventional approach. For classification tasks this recalibration is not as straightforward due to the categorical nature of the target data and did not show significant improvement. Again, the conventional approach proved to be more robust against distribution shifts and did achieve better performance after recalibration by normalization. Moreover, in the CWRU use case the FESC method achieves near perfect accuracy for a cross-domain scenario even before recalibration, thus can be considered as an example for domain generalization.

For future work further investigation is suggested in why FESC/R performs better than DNNs in inter-domain scenarios which could help in improving the ANN architectures making them more robust for real-world applications. One could assume that this results from the implicit extraction of useful information from the data during the feature extraction and selection steps reducing the task complexity and making the results more stable with respect to possible changes in the input data. On the other hand, the classical approach can be boosted by explicitly introducing non-linearities based on polynomial expansion of the features in combination with linear classification/regression algorithms as recently suggested [[55]]. This might allow better adaptation to non-linear dependencies, which is an area where ANNs are usually superior. Also, visualization methods are desirable to indicate where domain shifts occur. Finally, for industrial applications investigating applicable domain adaptation methods and algorithms is necessary because almost all real-life scenarios need to be robust against domain shift situations.

References 1 R. K. Mobley, An introduction to predictive maintenance. Elsevier, 2002. 2 A. Schütze, N. Helwig, and T. Schneider, "Sensors 4.0 – smart sensors and measurement technology enable Industry 4.0," Journal of Sensors and Sensor systems, vol. 7, no. 1, pp. 359–371, 2018. 3 D. C. Montgomery, Design and analysis of experiments. John Wiley & Sons, 2017. 4 P. W. Koh et al., "WILDS: A Benchmark of in-the-Wild Distribution Shifts," International Conference on Machine Learning, pp. 5637–5664, Dec. 2021. 5 J. G. Moreno-Torres, T. Raeder, R. Alaiz-Rodríguez, N. v. Chawla, and F. Herrera, "A unifying view on dataset shift in classification," Pattern Recognition, vol. 45, no. 1, pp. 521–530, Jan. 2012, doi: 10.1016/j.patcog.2011.06.019. 6 G. Widmer and M. Kubat, "Learning in the presence of concept drift and hidden contexts," Machine Learning, vol. 23, no. 1, pp. 69–101, Apr. 1996, doi: 10.1007/BF00116900. 7 M. G. Kelly, D. J. Hand, N. M. Adams, "The impact of changing populations on classifier performance," Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 367–371, 1999, doi: 10.1145/312129.312285. 8 H. Shimodaira, "Improving predictive inference under covariate shift by weighting the log-likelihood function," Journal of Statistical Planning and Inference, vol. 90, no. 2, pp. 227–244, Oct. 2000, doi: 10.1016/S0378-3758(00)00115-4. 9 D. A. Cieslak and N. v. Chawla, "A framework for monitoring classifiers' performance: when and why failure occurs?," Knowledge and Information Systems, vol. 18, no. 1, pp. 83–108, Jan. 2009, doi: 10.1007/s10115-008-0139-1. R. Alaiz-Rodríguez, A. Guerrero-Curieses, and J. Cid-Sueiro, "Minimax regret classifier for imprecise class distributions," Journal of Machine Learning Research, vol. 8, pp. 103–130, 2007. R. Caruana, "Multitask Learning," Machine Learning, vol. 28, no. 1, pp. 41–75, 1997, doi: 10.1023/A:1007379606734. A. Gretton, K. M. Borgwardt, M. J. Rasch, B. Schölkopf, and A. Smola, "A Kernel Two-Sample Test," Journal of Machine Learning Research, vol. 13, no. 25, pp. 723–773, 2012. K. Saenko, B. Kulis, M. Fritz, and T. Darrell, "Adapting Visual Category Models to New Domains," European conference on computer vision, pp. 213–226, 2010, doi: 10.1007/978-3-642-15561-1_16. Y. Ganin and V. Lempitsky, "Unsupervised Domain Adaptation by Backpropagation," ICML'15: Proceedings of the 32nd International Conference on Machine Learning – Volume 37, pp. 1180–1189, Jul. 2015. Y. Ganin et al., "Domain-adversarial training of neural networks," The journal of machine learning research, vol. 17, no. 1, pp. 2030–2096, 2016. E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell, "Adversarial Discriminative Domain Adaptation," 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2962–2971, Jul. 2017, doi: 10.1109/CVPR.2017.316. X. Li, W. Zhang, Q. Ding, and J.-Q. Sun, "Multi-Layer domain adaptation method for rolling bearing fault diagnosis," Signal Processing, vol. 157, pp. 180–197, Apr. 2019, doi: 10.1016/j.sigpro.2018.12.005. W.-G. Chang, T. You, S. Seo, S. Kwak, and B. Han, "Domain-Specific Batch Normalization for Unsupervised Domain Adaptation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2019. Z. Lu, Y. Yang, X. Zhu, C. Liu, Y.-Z. Song, and T. Xiang, "Stochastic Classifiers for Unsupervised Domain Adaptation," Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun. 2020. W. Li, Z. Yuan, W. Sun, and Y. Liu, "Domain Adaptation for Intelligent Fault Diagnosis under Different Working Conditions," MATEC Web of Conferences, vol. 319, p. 03001, Sep. 2020, doi: 10.1051/matecconf/202031903001. X. Wang, F. Liu, and D. Zhao, "Cross-Machine Fault Diagnosis with Semi-Supervised Discriminative Adversarial Domain Adaptation," Sensors, vol. 20, no. 13, p. 3753, Jul. 2020, doi: 10.3390/s20133753. W. Zhang, G. Peng, C. Li, Y. Chen, and Z. Zhang, "A New Deep Learning Model for Fault Diagnosis with Good Anti-Noise and Domain Adaptation Ability on Raw Vibration Signals," Sensors, vol. 17, no. 2, p. 425, Feb. 2017, doi: 10.3390/s17020425. N. Helwig, E. Pignanelli, and A. Schütze, "Condition monitoring of a complex hydraulic system using multivariate statistics," 2015 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), Proceedings, pp. 210–215, May 2015, doi: 10.1109/I2MTC.2015.7151267. T. Schneider, S. Klein, and M. Bastuck, "Condition monitoring of hydraulic systems Data Set at ZeMA," Zenodo, Apr. 2018, doi: 10.5281/ZENODO.1323611. "Case Western Reserve University Bearing Data Set," Case Western Reserve University Bearing Data Center. https://engineering.case.edu/bearingdatacenter. S. Zhang, S. Zhang, B. Wang, and T. G. Habetler, "Deep learning algorithms for bearing fault diagnostics – A comprehensive review," IEEE Access, vol. 8, pp. 29857–29881, 2020. T. Schneider, N. Helwig, and A. Schütze, "Industrial condition monitoring with smart sensors using automated feature extraction and selection," Measurement Science and Technology, vol. 29, no. 9, p. 94002, Aug. 2018, doi: 10.1088/1361-6501/aad1d4. A. Widodo and B.-S. Yang, "Support vector machine in machine condition monitoring and fault diagnosis," Mechanical Systems and Signal Processing, vol. 21, no. 6, pp. 2560–2574, Aug. 2007, doi: 10.1016/j.ymssp.2006.12.007. D.-T. Hoang and H.-J. Kang, "A survey on Deep Learning based bearing fault diagnosis," Neurocomputing, vol. 335, pp. 327–335, Mar. 2019, doi: 10.1016/j.neucom.2018.06.078. M. S. Mahdavinejad, M. Rezvan, M. Barekatain, P. Adibi, P. Barnaghi, and A. P. Sheth, "Machine learning for internet of things data analysis: a survey," Digital Communications and Networks, vol. 4, no. 3, pp. 161–175, 2018, doi: 10.1016/j.dcan.2017.10.002. W. Zhang, D. Yang, and H. Wang, "Data-Driven Methods for Predictive Maintenance of Industrial Equipment: A Survey," IEEE Systems Journal, vol. 13, no. 3, pp. 2213–2227, Sep. 2019, doi: 10.1109/JSYST.2019.2905565. A. Preece, D. Harborne, D. Braines, R. Tomsett, and S. Chakraborty, "Stakeholders in Explainable AI," arXiv preprint arXiv:1810.00184, Sep. 2018. C. Schorr, P. Goodarzi, F. Chen, and T. Dahmen, "Neuroscope: An Explainable AI Toolbox for Semantic Segmentation and Image Classification of Convolutional Neural Nets," Applied Sciences, vol. 11, no. 5, 2021, doi: 10.3390/app11052199. T. Dorst, Y. Robin, S. Eichstädt, A. Schütze, and T. Schneider, "Influence of synchronization within a sensor network on machine learning results," Journal of Sensors and Sensor Systems, vol. 10, no. 2, pp. 233–245, Aug. 2021, doi: 10.5194/jsss-10-233-2021. Y. Robin, P. Goodarzi, T. Baur, C. Schultealbert, A. Schütze, and T. Schneider, "Machine Learning based calibration time reduction for Gas Sensors in Temperature Cycled Operation," 2021 IEEE International Instrumentation and Measurement Technology Conference (I2MTC), pp. 1–6, May 2021, doi: 10.1109/I2MTC50364.2021.9459919. T. Schneider, S. Klein, and A. Schütze, "Machine learning in industrial measurement technology for detection of known and unknown faults of equipment and sensors," tm – Technisches Messen, vol. 86, no. 11, pp. 706–718, Nov. 2019, doi: 10.1515/teme-2019-0086. Y. Robin et al., "High-Performance VOC Quantification for IAQ Monitoring Using Advanced Sensor Systems and Deep Learning," Atmosphere, vol. 12, no. 11, p. 1487, Nov. 2021, doi: 10.3390/atmos12111487. I. Kononenko, E. Šimec, and M. Robnik-Šikonja, "Overcoming the Myopia of Inductive Learning Algorithms with RELIEFF," Applied Intelligence, vol. 7, no. 1, pp. 39–55, 1997, doi: 10.1023/A:1008280620621. R. A. Fisher, "The Use of Multiple Measurements in Taxonomic Problems," Annals of Eugenics, vol. 7, no. 2, pp. 179–188, Sep. 1936, doi: 10.1111/j.1469-1809.1936.tb02137.x. S. Wold, M. Sjöström, and L. Eriksson, "PLS-regression: a basic tool of chemometrics," Chemometrics and Intelligent Laboratory Systems, vol. 58, no. 2, pp. 109–130, Oct. 2001, doi: 10.1016/S0169-7439(01)00155-1. T. Brown et al., "Language Models are Few-Shot Learners," Advances in Neural Information Processing Systems, vol. 33, pp. 1877–1901. 2020. Z. Dai, H. Liu, Q. v. Le, and M. Tan, "CoAtNet: Marrying Convolution and Attention for All Data Sizes," arXiv preprint arXiv:2106.04803, June 2021. P. Foret, A. Kleiner, H. Mobahi, and B. Neyshabur, "Sharpness-Aware Minimization for Efficiently Improving Generalization," arXiv preprint arXiv:2010.01412, Oct. 2020. B. Zoph and Q. v. Le, "Neural Architecture Search with Reinforcement Learning," arXiv preprint arXiv:1611.01578, Nov. 2016. T. Elsken, J. H. Metzen, and F. Hutter, "Neural architecture search: A survey," The Journal of Machine Learning Research, vol. 20, no. 1, pp. 1997–2017, 2019. K. He, X. Zhang, S. Ren, and J. Sun, "Deep Residual Learning for Image Recognition," Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Jun. 2016. A. van den Oord et al., "WaveNet: A Generative Model for Raw Audio," arXiv preprint arXiv:1609.03499, Sep. 2016. J. Snoek, H. Larochelle, and R. P. Adams, "Practical Bayesian Optimization of Machine Learning Algorithms," Advances in neural information processing systems, vol. 25, Jun. 2012. M. Holschneider, R. Kronland-Martinet, J. Morlet, and Ph. Tchamitchian, "A Real-Time Algorithm for Signal Analysis with the Help of the Wavelet Transform," in Wavelets, Springer, Berlin, Heidelberg, 1990, pp. 286–297, doi: 10.1007/978-3-642-75988-8_28. C. Zhang, S. Bengio, M. Hardt, B. Recht, and O. Vinyals, "Understanding deep learning (still) requires rethinking generalization," Communications of the ACM, vol. 64, no. 3, pp. 107–115, Mar. 2021, doi: 10.1145/3446776. R. Taori, A. Dave, V. Shankar, N. Carlini, B. Recht, and L. Schmidt, "Measuring Robustness to Natural Distribution Shifts in Image Classification," arXiv preprint arXiv:2007.00644, Jul. 2020. D. N. Perkins, G. Salomon, et al., "Transfer of learning," International encyclopedia of education, vol. 2, pp. 6452–6457, 1992. S. Bozinovski, "Reminder of the First Paper on Transfer Learning in Neural Networks, 1976," Informatica (Slovenia), vol. 44, 2020. I. Gulrajani and D. Lopez-Paz, "In Search of Lost Domain Generalization," arXiv preprint arXiv:2007.01434, Jul. 2020. S. Youssef, "Einsatz maschineller Lernalgorithmen zur mikromagnetischen Materialcharakterisierung," dissertation, Saarland University, 2021. Footnotes Jittering is a simple but effective method to improve the visualization of discrete values in a scatter plot. By adding random noise to the observations, the overlapping points are separated. Note that it does not change the data permanently.

By Payman Goodarzi; Andreas Schütze and Tizian Schneider

Reported by Author; Author; Author

Payman Goodarzi studied Embedded Systems at Saarland University and received his Master of Science degree in March 2020 with a thesis on the interpretability of neural networks. Since that time, he has been working at the Lab for Measurement Technology (LMT) of Saarland University and at the Centre for Mechatronics and Automation Technology (ZeMA) as a scientific researcher. His research interests include ML and deep learning for condition monitoring of technical systems.

Andreas Schütze received his diploma in physics from RWTH Aachen in 1990 and his doctorate in Applied Physics from Justus-Liebig-Universität in Gießen in 1994 with a thesis on microsensors and sensor systems for the detection of reducing and oxidizing gases. From 1994 until 1998 he worked for VDI/VDE-IT, Teltow, Germany, mainly in the fields of microsystems technology. From 1998 until 2000 he was professor for Sensors and Microsystem Technology at the University of Applied Sciences in Krefeld, Germany. Since April 2000 he is professor for Measurement Technology in the Department Systems Engineering at Saarland University, Saarbrücken, Germany and head of the Laboratory for Measurement Technology (LMT). His research interests include smart gas sensor systems as well as data engineering methods for industrial applications.

Tizian Schneider studied Microtechnologies and Nanostructures at Saarland University and received his Master of Science degree in January 2016. Since that time, he has been working at the Lab for Measurement Technology (LMT) of Saarland University and at the Centre for Mechatronics and Automation Technology (ZeMA) leading the research group Data Engineering & Smart Sensors. His research interests include ML methods for condition monitoring of technical systems, automatic ML model building and interpretable AI.

Titel:	Comparison of different ML methods concerning prediction quality, domain adaptation and robustness.
Autor/in / Beteiligte Person:	Goodarzi, Payman ; Schütze, Andreas ; Schneider, Tizian
Link:	Volltext (PDF)
Zeitschrift:	Technisches Messen, Jg. 89 (2022-04-01), Heft 4, S. 224-239
Veröffentlichung:	2022
Medientyp:	academicJournal
ISSN:	0171-8096 (print)
DOI:	10.1515/teme-2021-0129
Schlagwort:	ARTIFICIAL neural networks COMPUTER vision MACHINE learning FEATURE extraction PRINCIPAL components analysis FORECASTING Subjects: ARTIFICIAL neural networks COMPUTER vision MACHINE learning FEATURE extraction PRINCIPAL components analysis FORECASTING condition monitoring domain adaptation Machine learning neural network Domänenadaption maschinelles Lernen neuronale Netze Zustandsüberwachung Language of Keywords: English; German
Sonstiges:	Nachgewiesen in: DACH Information Sprachen: English Alternate Title: Vergleich verschiedener ML-Methoden bezüglich Vorhersagequalität, Domänenanpassung und Robustheit. Document Type: Article Author Affiliations: 1 = Universität des Saarlandes, Lab for Measurement Technology, 66123 Saarbrücken, Germany

Klicken Sie ein Format an und speichern Sie dann die Daten oder geben Sie eine Empfänger-Adresse ein und lassen Sie sich per Email zusenden.

BibTeX Citavi, JabRef, u.a.
(Literaturverwaltung)

PDF kein Volltext!
(Merkzettel, Notizen)

RIS Endnote, Citavi u.a.
(Literaturverwaltung)

MODS
(XML zur Weiterverarbeitung)

oder

Wählen Sie das für Sie passende Zitationsformat und kopieren Sie es dann in die Zwischenablage, lassen es sich per Mail zusenden oder speichern es als PDF-Datei.

Gewünschter Zitations-Stil:

oder

Bitte prüfen Sie, ob die Zitation formal korrekt ist, bevor Sie sie in einer Arbeit verwenden. Benutzen Sie gegebenenfalls den "Exportieren"-Dialog, wenn Sie ein Literaturverwaltungsprogramm verwenden und die Zitat-Angaben selbst formatieren wollen.

Comparison of different ML methods concerning prediction quality, domain adaptation and robustness.