Regression
- class TwinAPI.SimuLearn.MLibrary.RegressionML
A regression problem can be defined as a case scenario where the output is in terms of numerical values. For example: If the dataset is related to calculating reynolds number from a given input variables of velocity, diameter and kinematic viscosity. Here the output contains values which are similar/not similar to each other while being a numerical value.
- setup(data: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame, str]] = None, x_features: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, y_labels: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, verbose: int = 0, header: bool = False)
This function trains the model for a given set of parameters. Incase of Classification and Regression
user_model
is required while all other parameters are optional. Few parameters are subset to specific ML case.Example
1from TwinAPI.SimuLearn.MLibrary import RegressionML 2model_regression = RegressionML() 3exp_x, exp_y = model_regression.setup(data = 'input.csv', x_features = '[0:5, 4, x1:x5]', y_labels = '[12]')
- data: Union[list, np.ndarray, pd.DataFrame, str], default = None
Input dataset for the training experiment. It takes list, numpy nd array or pandas dataFrame as input. User can also provide a ‘csv’ or ‘xls’ or ‘xlsx’as an input string. Generally a ‘csv’ file or a dataframe works better.
- x_features: Union[list, np.ndarray, pd.DataFrame], default = None
Name of the input x features. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4, x1:x5]’.
- y_labels: Union[list, np.ndarray, pd.DataFrame], default = None
Name of the output y labels. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example:’[0:5, 4,y1:y5]’.
- verbose: int, default = 0
Verbosity of the results. Ranges from 0 to 2 and accepts integer values.
‘0’ value provides with only training model name and prediction score.
‘1’ value provides with training model name and different prediction scores.
‘2’ value provides with a json file with all the above information, defaults to 0
- header: bool, default = False
When set to False, the input header is ignored else it will be removed. Generally setting to False it better as the model automatically removes it when training.
- Returns
Tuple of selected X and Y values as dictionary.
- preprocess(normalize: bool = True, normalize_method: Optional[Union[list, str]] = 'standardizer', transformer: bool = False, transformer_method: str = 'onehot encoder', fix_imbalance: bool = False, imbalance_method: str = 'SMOTE', preprocess_arguments: Optional[dict] = None)
This function preprocesses the selected x features and y labels, and performs user selected preprocessing steps. All parameters are optional by default only parameter is set, i.e. normalize method.
Example
1from TwinAPI.SimuLearn.MLibrary import RegressionML 2model_regression = RegressionML() 3preprocess_x, preprocess_y = model_regression.preprocess()
- normalize: bool, default = True
By default it is set to True and it is applied to the training pipeline.
- normalize_method: Optional[Union[list, str]], default = “standardizer”
When
normalize
is True, allows for selecting from a set of preprocessing modules provided by sklearn. Accepted values are:‘binarizer’
‘minmax’
‘normalizer’
‘standardizer’
‘PCA’
‘truncated SVD’
‘select KBest’
- transformer: bool, default = False
When set to True, allows for selecting a transfomer method. Used only in cases of string or category classification or regression.
- transformer_method: str, default = “standardizer”
When
transformer
is True, allows for selecting from a set of preprocessing modules provided by sklearn.‘label encoder’
‘onehot encoder’
- fix_imbalance: bool, default = False
When set to True, allows for selecting a transfomer method. Used only in cases of string or category classification or regression.
- imbalance_method: str, default = “SMOTE”
When
fix_imbalance
is True, allows for selecting from a set of preprocessing modules provided by imblearn.‘SMOTE’
‘random undersampling’
- preprocess_arguments: Optional[dict], default = None
Allows users to pass parameters for any of the selected normalize or transformer or imbalance method provided that parameter is accepted by the function. Accepts only dictionary with keys as parameter name and value as parameter value.
- Returns
Tuple of preprocessed X features and Y labels.
- train(user_model: Optional[str] = None, scoring_method: str = 'r2', split_mode: str = 'kfold split', test_size: float = 0.25, cross_validation_mode: str = 'kfold', n_iter_cv: int = 5, optimize: bool = False, optimizer_method: str = 'grid_search', n_jobs: int = 3, turbo_mode: bool = True, fit_arguments: Optional[dict] = None)
This function trains the model for a given set of parameters. Incase of Classification and Regression
user_model
is required while all other parameters are optional. Few parameters are subset to a specific ML case.Example
1from TwinAPI.SimuLearn.MLibrary import RegressionML 2model_regression = RegressionML() 3model_regression.train(user_model = 'Random Forest Regressor')
- user_model: Optional[str], default = None
String of estimator IDs based on ML case, irrelevant in case of Auto
‘Random Forest Regressor’
‘Decision Tree Regressor’
‘Support Vector Regressor’
‘Linear Support Vector Regressor’
‘Linear Regressor’
‘ExtraTrees Regressor’
‘GradientBoostingRegressor’
‘AdaBoostRegressor’
‘Stochastic Gradient Descent Regressor’
- scoring_method: str, default = ‘r2’
Scoring methodology to testing prediction scores. Follows the sklearn scorer terminology. Accepted values are:
‘r2’
‘neg_mean_absolute_error’
‘neg_mean_squared_error’
‘neg_mean_absolute_percentage_error’
- split_mode: str, default = ‘kfold split’
Selection of train and test data for training model.
‘test-train split’
‘kfold split’
- test_size: float, default = 0.25
Test size for test-train split. Divides train and test in ratio of selected value. Example: train:test = 0.75:0.25.
- cross_validation_mode: str, default = ‘kfold
Choice of cross validation strategy. Possible values are:
‘kfold’
‘stratified kfold’
‘leave-one out’
‘shuffle split’
- n_iter_cv: int, default = 5
Number of iteration for cross validation model selection. The higher the number the longer the processing time.
- optimize: bool, default = False
When set to True, a model is applicable for optimization strategies.
- optimizer_method: str, default = ‘grid_search’
When
optimize
is set to True, allows for hyperparameter optimization method to train the model with best set of parameters for the estimator.
Note
Optimization may not always result in best results.
- n_jobs: int, default = 3
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor.
- turbo_mode: bool, default = True
When set to False, another iteration of optimization is carred out inorder to avoid overfitting the model.
- fit_arguments: dict, default = None
Allows users to pass parameters for any of the selected estimator provided that parameter is accepted by the function. Accepts only dictionary with keys as parameter name and value as parameter value.
- Returns
Tuple of Trained model ID, trained model function and test scores.
Warning
Changing
turbo
to False may result in very high training times.
- predict(load_model: bool = False, model_name: Optional[str] = None, prediction_set: Optional[Any] = None)
This function predicts new data based on trained model or on provided trained model from load model function.
Example
1from TwinAPI.SimuLearn.MLibrary import RegressionML 2model_regression = RegressionML() 3prediction = model_regression.predict(load_model= True, model_name= 'trained_model', [1])
- load_model: bool, default = False
When set to True, the function searches for a trained model provided by user
- model_name: Optional[str], default = None
If
load_model
is set to True, Model name for the loading model is accepted.- prediction_set: Optional[Any] = None, default = None
Prediction dataset, integer or a list of values to be predicted
- Returns
List of prediction.
- modelsave(model_name: Optional[Union[int, str]] = None)
This function saves the trained model.
Example
1from TwinAPI.SimuLearn.MLibrary import RegressionML 2model_regression = RegressionML() 3model_regression.modelsave('trained_model')
- model_name: Optional[Union[int, str]], default = None
Name for the trained model.
- Returns
Json file.