AutoML

For an advanced, to allow complete control over the output, the user is provided with several parameters. A basic requirement for the user to set would be to provide with a dataset, input features (columns or indices), output label (columns or indices) and a model name (optional).

1. PROBLEM DESCRIPTION

In this example, we take a look at a dataset from a DAEWOO Steel Co. Ltd Steel Energy Industry in Gwangyang, South Korea available in the public dataset repository. The main purpose of the plant is to produce several types of coils, steel plates, and iron plates. Similarly, the dataset is a collection of its electricity consumption held in their cloud. The dataset comprises of Industry Energy Consumption Continuous kWh, Lagging Current reactive power Continuous kVarh, Leading Current reactive power Continuous kVarh, tCO2(CO2) Continuous ppm, Lagging Current power factor Continuous , Leading Current Power factor Continuous, Number of Seconds from midnight Continuous, Week status , Day of week and Load Type. Here, the Engery Consumption kWh is affected by several input variables.

2. MODEL DESCRIPTION & DEVELOPMENT

The following sections describe the TwinAPI.SimuLearn.AutoML module with each subsection explaining the commands:

2.1. Setting up the model

2.2. Training the model

2.3. Predicting test data

2.4. Saving the Model

2.1. Setting up the model

In order to access the information within the dataset we initiate an object for the TwinAPI.SimuLearn.AutoML class calling it as ‘model_auto’. We choose the x_features and y_labels which are inputs and output respectively, we also have the ability to get information on the dataset, we set the plot to either ‘basic’ or ‘detailed’ giving us with different plots concerning the dataset. The verbose is output verbosity, it varies from 0-2, it ranges from basic information on the dataset to a summary report at the end of training model.

from TwinAPI.SimuLearn import AutoML


model_auto = AutoML()
model_auto.setup(data="datasets/steel_industry_data.csv", x_features=['3:8', '10:11'], y_labels=['2'], verbose=2, plot='basic')

../_images/AdvClassificationplots.png — Input parameter w.r.t Output (Usage kWh)

2.3. Training the model

In TwinAPI.SimuLearn.AutoML module, the user can select from either of ML usecase, regression or classification, the user also has the option to view a leaderboard to asses the ranking of other models compared to the selected model. In the scenario that user is not aware of what kind of problem the dataset is applied to, disregarding the user_mlcase parameter will result in automated selection of the best Machine Learning usecase.

model_auto.train(scoring_method='auto',leader_board=True)

2.4. Predicting test data

We can assess our model’s accuracy by providing with a prediction dataset not seen by the model while training. We provide with a list of inputs similar to the training dataset.

prediction = [23.51, 0, 0.03, 92.89, 100, 42300, 'Friday', 'Maximum_Load']
model_auto.predict(prediction_set=prediction)
print('Expected value: 58.97')

Note

After defining and saving our model, we can later call our trained model to test for our future predictions. By providing with load_model and model_name.

from TwinAPI.SimuLearn.MLibrary import AutoML
model_auto = AutoML()
prediction = model_auto.predict(load_model= True, model_name= 'my_auto_model', prediction_set=[23.51, 0, 0.03, 92.89, 100, 42300, 'Friday', 'Maximum_Load'])

2.5. Saving the Model

If we are satisfied with the results, we can save our model, by using the savemodel method. It creates a ‘json’ file in the backend.

model_auto.savemodel('my_auto_model')

3. SUMMARY

In this tutorial, we learned how to set up and solve a regression problem using AutoML. We also are made aware of different plots that can be achieved using the parameters.

4. SOURCE CODE

# initiate the class
model_auto = AutoML()

# setting up the model
model_auto.setup(data="datasets/steel_industry_data.csv",
                x_features=['3:8', '10:11'],
                y_labels=['2'],
                verbose=1,
                plot='basic')

# train the model
model_auto.train(cross_validation_mode='timeseries split',
                exclude=['Linear Regressor', 'Support Vector Regressor'],
                n_jobs=-1,
                scoring_method='auto',
                leader_board=True)

# sample prediction
prediction = [23.51, 0, 0.03, 92.89, 100, 42300, 'Friday', 'Maximum_Load']
model_auto.predict(prediction_set=prediction)
print('Expected value: 58.97')

# save the model
model_auto.modelsave(model_name='auto_model')

Keywords: AutoML, Predictive, Machine, Faults

Datset Reference:[https://archive.ics.uci.edu/ml/datasets/Steel+Industry+Energy+Consumption+Dataset] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.