Classification

For a beginner, to reduce complexity the library is equipped with default parameters which makes it easier in achieving acceptable results. The basic requirement, however, would be to provide with a dataset, input features (columns or indices), output label (columns or indices) and a model name (optional).

1. PROBLEM DESCRIPTION

In this example, we take a look at a dataset from an electric drive. The drive has intact and defective components which results in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, this means by different speeds, load moments and load forces. The current signals are measured with a current probe and an oscilloscope on two phases. Here, the output is distinguished based on the class/operating condition of the drive component.

2. MODEL DESCRIPTION & DEVELOPMENT

The following sections describe the TwinAPI.SimuLearn.ClassificationML module with each subsection explaining the commands:

2.1. Setting up the model

2.2. Preprocessing the data

2.3. Training the model

2.4. Predicting test data

2.5. Saving the Model

2.1. Setting up the model

In order to access the information within the dataset we initiate an object for the TwinAPI.SimuLearn.ClassificationML class calling it as ‘model_classification’. We choose the x_features and y_labels which are inputs and output respectively, we also have the ability to get information on the dataset, we set the plot to either ‘basic’ or ‘detailed’ giving us with different plots concerning the dataset. The verbose is output verbosity, it varies from 0-2, it ranges from basic information on the dataset to a summary report at the end of training model.

from TwinAPI.SimuLearn import ClassificationML


model_classification = ClassificationML()
model_classification.setup(data="datasets/sensorless_drive_diagnosis.csv", x_features=['1:48'], y_labels=['49'], verbose=1)

2.2. Preprocessing the data

By defining the parameters for the preprocess method we have the option to preprocess the dataset according a preset of functions, here, we chose ‘standardizer’ and ‘PCA’ (to reduce the parameters).

model_classification.preprocess(normalize=True,normalize_method=['standardizer', 'PCA'])

2.3. Training the model

Here we have a plethora of choices from the Machine Learning Regression Algorithms from TwinAPI.SimuLearn library. We chose, ‘ExtraTree Classifier’ as our regression algorithm. We have the option to chose what kind of split of dataset we require, here we chose ‘kfold’ split with a ‘stratified kfold’ selection which is best for imbalanced datasets. The n_iter_cv value describes the number of iterations for the cross validation (used while splitting the dataset and training the model). The n_jobs is only reccommended to change if the computation time is slow, as more processors are used a faster output can be achieved. scoring_method for a classification algorithm by default is ‘balanced_accuracy’ to get an even result for test prediction as we have more than two class provided by sklearn library.

model_classification.train(user_model="ExtraTree Classifier", n_jobs=-1,cross_validation_mode='kfold split', split_mode='stratified kfold', n_iter_cv=6, scoring_method='balanced_accuracy')

2.4. Predicting test data

We can assess our model’s accuracy by providing with a prediction dataset not seen by the model while training. We provide with a list of inputs similar to the training dataset.

2.5. Saving the Model

If we are satisfied with the results, we can save our model, by using the savemodel method. It creates a json file in the backend.

model_classification.savemodel('my_classification_model')

3. SUMMARY

In this tutorial, we learned how to set up and solve a classification problem.

4. SOURCE CODE

from TwinAPI.SimuLearn.MLibrary import ClassificationML

# initiate the class
model_classification = ClassificationML()

# setting up the model
model_classification.setup(data="datasets/sensorless_drive_diagnosis.csv",
                            x_features=['1:48'],
                            y_labels=['49'],
                            verbose=1)

# preprocess the data
model_classification.preprocess(normalize=True,normalize_method=['standardizer', 'PCA'])

# train the model
model_classification.train(user_model="ExtraTree Classifier",
                            n_jobs=-1,
                            split_mode='stratified kfold',
                            cross_validation_mode='kfold split',
                            n_iter_cv=6,
                            scoring_method='balanced_accuracy')

# sample prediction
model_classification.predict()

# save the model
model_classification.modelsave(model_name='my_classification_model')

Keywords: Classification, Predictive, Machine, Faults

Datset Reference:[https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.