Classification
For a beginner, to reduce complexity the library is equipped with default parameters which makes it easier in achieving acceptable results. The basic requirement, however, would be to provide with a dataset, input features (columns or indices), output label (columns or indices) and a model name (optional).
1. PROBLEM DESCRIPTION
In this example, we take a look at a dataset from an electric drive. The drive has intact and defective components which results in 11 different classes with different conditions. Each condition has been measured several times by 12 different operating conditions, this means by different speeds, load moments and load forces. The current signals are measured with a current probe and an oscilloscope on two phases. Here, the output is distinguished based on the class/operating condition of the drive component.
2. MODEL DESCRIPTION & DEVELOPMENT
The following sections describe the TwinAPI.SimuLearn.ClassificationML
module with each subsection explaining the commands:
2.1. Setting up the model
In order to access the information within the dataset we initiate an object for the TwinAPI.SimuLearn.ClassificationML
class calling it as ‘model_classification’.
We choose the x_features and y_labels which are inputs and output respectively, we also have the ability to get information on the dataset, we set the plot to either ‘basic’ or ‘detailed’ giving us with different plots concerning the dataset.
The verbose
is output verbosity, it varies from 0-2, it ranges from basic information on the dataset to a summary report at the end of training model.
1from TwinAPI.SimuLearn import ClassificationML
2
3
4model_classification = ClassificationML()
5model_classification.setup(data="datasets/sensorless_drive_diagnosis.csv", x_features=['1:48'], y_labels=['49'], verbose=1)
2.2. Preprocessing the data
By defining the parameters for the preprocess method we have the option to preprocess the dataset according a preset of functions, here, we chose ‘standardizer’ and ‘PCA’ (to reduce the parameters).
1model_classification.preprocess(normalize=True,normalize_method=['standardizer', 'PCA'])
2.3. Training the model
Here we have a plethora of choices from the Machine Learning Regression Algorithms from TwinAPI.SimuLearn library. We chose, ‘ExtraTree Classifier’ as our regression algorithm.
We have the option to chose what kind of split of dataset we require, here we chose ‘kfold’ split with a ‘stratified kfold’ selection which is best for imbalanced datasets.
The n_iter_cv
value describes the number of iterations for the cross validation (used while splitting the dataset and training the model).
The n_jobs
is only reccommended to change if the computation time is slow, as more processors are used a faster output can be achieved. scoring_method
for a classification algorithm by default is ‘balanced_accuracy’ to get an even result for test prediction as we have more than two class provided by sklearn
library.
1model_classification.train(user_model="ExtraTree Classifier", n_jobs=-1,cross_validation_mode='kfold split', split_mode='stratified kfold', n_iter_cv=6, scoring_method='balanced_accuracy')
2.4. Predicting test data
We can assess our model’s accuracy by providing with a prediction dataset not seen by the model while training. We provide with a list of inputs similar to the training dataset.
2.5. Saving the Model
If we are satisfied with the results, we can save our model, by using the savemodel
method. It creates a json file in the backend.
1model_classification.savemodel('my_classification_model')
3. SUMMARY
In this tutorial, we learned how to set up and solve a classification problem.
4. SOURCE CODE
1from TwinAPI.SimuLearn.MLibrary import ClassificationML
2
3# initiate the class
4model_classification = ClassificationML()
5
6# setting up the model
7model_classification.setup(data="datasets/sensorless_drive_diagnosis.csv",
8 x_features=['1:48'],
9 y_labels=['49'],
10 verbose=1)
11
12# preprocess the data
13model_classification.preprocess(normalize=True,normalize_method=['standardizer', 'PCA'])
14
15# train the model
16model_classification.train(user_model="ExtraTree Classifier",
17 n_jobs=-1,
18 split_mode='stratified kfold',
19 cross_validation_mode='kfold split',
20 n_iter_cv=6,
21 scoring_method='balanced_accuracy')
22
23# sample prediction
24model_classification.predict()
25
26# save the model
27model_classification.modelsave(model_name='my_classification_model')
Keywords: Classification, Predictive, Machine, Faults
Datset Reference:[https://archive.ics.uci.edu/ml/datasets/Dataset+for+Sensorless+Drive+Diagnosis] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.