Regression

For an advanced, to allow complete control over the output, the user is provided with several parameters. A basic requirement for the user to set would be to provide with a dataset, input features (columns or indices), output label (columns or indices) and a model name (optional).

1. PROBLEM DESCRIPTION

In this example we look at a dataset containing 9568 data points collected from a public dataset repository. This dataset has datapoints collected from a Combined Cycle Power Plant over 6 years (2006-2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V) to predict the net hourly electrical energy output (EP) of the plant. A combined cycle power plant (CCPP) is composed of gas turbines (GT), steam turbines (ST) and heat recovery steam generators. Here, the Electrical Energy Output is being affected by different input feature variables.

2. MODEL DESCRIPTION & DEVELOPMENT

The following sections describe the TwinAPI.SimuLearn.RegressionML module with each subsection explaining the commands:

2.1. Setting up the model

2.2. Preprocessing the data

2.3. Training the model

2.4. Predicting test data

2.5. Saving the Model

2.1. Setting up the model

In order to access the information within the dataset we initiate an object for the TwinAPI.SimuLearn.RegressionML class calling it as ‘model_regression’. We choose the x_features and y_labels which are inputs and output respectively, we also have the ability to get information on the dataset, we set the plot to either ‘basic’ or ‘detailed’ giving us with different plots concerning the dataset. The verbose is output verbosity, it varies from 0-2, it ranges from basic information on the dataset to a summary report at the end of training model.

1from TwinAPI.SimuLearn import RegressionML
2
3
4model_regression = RegressionML()
5model_regression.setup(data="datasets/gas_steam_turbined.csv",x_features=['1:4'],y_labels=['5'],verbose=1, plot='detailed')

2.2. Preprocessing the data

By defining the parameters for the preprocess method we have the option to preprocess the dataset according a preset of functions, here, we chose ‘standardizer’.

1model_regression.preprocess(normalize=True, normalize_method='standardizer')

2.3. Training the model

Here we have a plethora of choices from the Machine Learning Regression Algorithms from TwinAPI.SimuLearn library. We chose, ‘Decision Tree Regressor’ as our regression algorithm. We have the option to chose what kind of split of dataset we require, here for the sake of displaying the options we chose ‘test-train-split’ and with a test_size of 0.35. The n_jobs parameter is only reccommended to change if the computation time is slow, as more processors are used a faster output can be achieved. scoring_method for a Regression algorithm by default is ‘r2’ provided by sklearn library.

1model_regression.train(user_model="Decision Tree Regressor", split_mode='test-train split',test_size=0.35, scoring_method='r2',n_jobs=-1)

2.4. Predicting test data

We can assess our model’s accuracy by providing with a prediction dataset not seen by the model while training. We provide with a list of inputs similar to the training dataset.

1prediction = [24.61, 69.68, 1012.06, 92.47]
2model_regression.predict(prediction_set=prediction)
3print('Expected value: 438.51')

Note

After defining and saving our model, we can later call our trained model to test for our future predictions. By providing with load_model and model_name.

1from TwinAPI.SimuLearn.MLibrary import RegressionML
2model_regression = RegressionML()
3prediction = model_regression.predict(load_model= True, model_name= 'my_regression_model', prediction_set=[24.61, 69.68, 1012.06, 92.47])

2.5. Saving the Model

If we are satisfied with the results, we can save our model, by using the savemodel method. It creates a ‘json’ file in the backend.

1model_regression.savemodel('my_regression_model')

3. SUMMARY

In this tutorial, we learned how to set up and solve a regression problem.

4. SOURCE CODE

 1# initiate the class
 2regression_model = RegressionML()
 3
 4# setting up the model
 5model_regression.setup(data="datasets/gas_steam_turbined.csv",
 6                    x_features=['1:4'],
 7                    y_labels=['5'],
 8                    verbose=1,
 9                    plot='detailed')
10
11# preprocess the data
12model_regression.preprocess(normalize=True, normalize_method='standardizer')
13
14# train the model
15model_regression.train(user_model="Decision Tree Regressor",
16                    split_mode='test-train split',
17                    test_size=0.35,
18                    scoring_method='r2',
19                    n_jobs=-1)
20
21# sample prediction
22prediction = [24.61, 69.68, 1012.06, 92.47]
23model_regression.predict(prediction_set=prediction)
24print('Expected value: 438.51')
25
26# save the model
27model_regression.modelsave(model_name='my_regression_model')

Keywords: Regression, Gas Steam Turbin, Decision Tree

Datset Reference:[https://archive.ics.uci.edu/ml/datasets/combined+cycle+power+plant] Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.