Regression

For a beginner, to reduce complexity the library is equipped with default parameters which makes it easier in achieving acceptable results. The basic requirement, however, would be to provide with a dataset, input features (columns or indices), output label (columns or indices) and a model name (optional).

1. PROBLEM DESCRIPTION

In this example, we take a look at a synthentic dataset based on sinusiodal wave. The dataset contains information on the input integer values and outputs as sine of input value.

2. MODEL DESCRIPTION & DEVELOPMENT

The following sections describe the TwinAPI.SimuLearn.RegressionML module with each subsection explaining the commands:

2.1. Setting up the model

2.2. Preprocessing the data

2.3. Training the model

2.4. Predicting test data

2.5. Saving the Model

2.1. Setting up the model

In order to access the information within the dataset we initiate an object for the TwinAPI.SimuLearn.RegressionML class calling it as ‘model_regression’. We choose the x_features and y_labels which are inputs and output respectively, we also have the ability to get information on the dataset, we set the plot to either ‘basic’ or ‘detailed’ giving us with different plots concerning the dataset.

1from TwinAPI.SimuLearn import RegressionML
2
3
4model_regression = RegressionML()
5model_regression.setup(data="datasets/sine.csv", x_features=['1'], y_labels=['2'])

2.2. Preprocessing the data

By default, the preprocess method sets normalize as ‘True’ and normalize_method as ‘standardizer’. A basic user doesn’t need to change this settings.

1model_regression.preprocess()

2.3. Training the model

Here we have a plethora of choices from the Machine Learning Regression Algorithms from TwinAPI.SimuLearn library. We chose, ‘ExtraTree Regressor’ as our classification algorithm.

1model_regression.train(user_model="ExtraTree Regressor")

2.4. Predicting test data

We can assess our model’s accuracy by providing with a prediction dataset not seen by the model while training. We provide with a list of inputs similar to the training dataset.

1p = [5, 23.012, 34.54, 56.428, 67.21, 88.91, 92.3, 100]
2yp = (p - (100 * np.sin(p))) / np.sqrt(2)
3y_prediction = []
4
5for each in p:
6    y_prediction.append(*model_regression.predict(prediction_set=[each]))
7print(f'Expected prediction set {[*zip(p,yp)]}')

Note

After defining and saving our model, we can later call our trained model to test for our future predictions. By providing with load_model and model_name.

1from TwinAPI.SimuLearn.MLibrary import RegressionML
2model_regression = RegressionML()
3prediction = model_regression.predict(load_model= True, model_name= 'my_regression_model')

2.5. Saving the Model

If we are satisfied with the results, we can save our model, by using the savemodel method. It creates a ‘json’ file in the backend.

1model_regression.savemodel('my_regression_model')

3. SUMMARY

In this tutorial, we learned how to set up and solve a regression problem.

4. SOURCE CODE

 1from TwinAPI.SimuLearn.MLibrary import RegressionML
 2import numpy as np
 3
 4# initiate the class
 5model_regression = RegressionML()
 6
 7# setting up the model
 8model_regression.setup(data="datasets/sine.csv",
 9                    x_features=['1'],
10                    y_labels=['2'],
11                    plot='detailed')
12
13# preprocess the data
14model_regression.preprocess()
15
16# train the model
17model_regression.train(user_model="ExtraTree Regressor")
18
19# sample prediction
20p = [5, 23.012, 34.54, 56.428, 67.21, 88.91, 92.3, 100]
21yp = (p - (100 * np.sin(p))) / np.sqrt(2)
22y_prediction = []
23
24for each in p:
25    y_prediction.append(*model_regression.predict(prediction_set=[each]))
26print(f'Expected prediction set {[*zip(p,yp)]}')
27
28# save the model
29model_regression.modelsave(model_name='my_regression_model')

Keywords: Regression, Sine, Support Vector

Dataset source: Simularge synthentic dataset (2022)