Auto ML

class TwinAPI.SimuLearn.MLibrary.AutoML

An AUTOML class is a culmination of both regression and classification problems mixed within. It is an easy-to-use automatic Machine Learning library.

setup(data: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame, str]] = None, x_features: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, y_labels: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, verbose: int = 0, header: bool = False)

This function trains the model for a given set of parameters. Incase of Classification and Regression user_model is required while all other parameters are optional. Few parameters are subset to specific ML case.

Example

from TwinAPI.SimuLearn.MLibrary import AutoML
model_auto = AutoML()
exp_x, exp_y = model_auto.setup(data = 'input.csv', x_features = '[0:5, 4, x1:x5]', y_labels = '[12]')

data: Union[list, np.ndarray, pd.DataFrame, str], default = None

Input dataset for the training experiment. It takes list, numpy nd array or pandas dataFrame as input. User can also provide a ‘csv’ or ‘xls’ or ‘xlsx’as an input string. Generally a ‘csv’ file or a dataframe works better.

x_features: Union[list, np.ndarray, pd.DataFrame], default = None

Name of the input x features. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4, x1:x5]’.

y_labels: Union[list, np.ndarray, pd.DataFrame], default = None

Name of the output y labels. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4,y1:y5]’.

verbose: int, default = 0

Verbosity of the results. Ranges from 0 to 2 and accepts integer values.

‘0’ value provides with only training model name and prediction score.
‘1’ value provides with training model name and different prediction scores.
‘2’ value provides with a json file with all the above information, defaults to 0

header: bool, default = False

When set to False, the input header is ignored else it will be removed. Generally setting to False it better as the model automatically removes it when training.

Returns: Tuple of selected X and Y values as dictionary.

train(user_mlcase: Optional[str] = None, include: Optional[Union[list, str]] = None, exclude: Optional[Union[list, str]] = None, n_jobs: int = 3, turbo_mode: bool = True, optimize: bool = False, split_mode: str = 'kfold split', test_size: float = 0.25, n_iter_cv: str = 5, cross_validation_mode: str = 'kfold', optimizer_method: str = 'grid_search', scoring_method: str = 'auto', leader_board: bool = False)

This function trains the model for a given set of parameters.

Example

from TwinAPI.SimuLearn.MLibrary import AutoML
model_auto = AutoML()
model_auto.train()

user_mlcase: Optional[str], default = None

Incase the value is not given, the code automatically calculates for ML case

scoring_method: str, default = ‘auto’

Scoring methodology to testing prediction scores. Follows the sklearn scorer terminology. Accepted values are:

‘auto’
‘accuracy’
‘roc_auc’
‘recall’
‘precision’
‘f1’
‘balanced_accuracy’
‘f1_weighted’
‘r2’
‘neg_mean_absolute_error’
‘neg_mean_squared_error’
‘neg_mean_absolute_percentage_error’

split_mode: str, default = ‘kfold split’

Selection of train and test data for training model.

‘test-train split’
‘kfold split’

test_size: float, default = 0.25

Test size for test-train split. Divides train and test in ratio of selected value. Example: train:test = 0.75:0.25.

cross_validation_mode: str, default = ‘kfold

Choice of cross validation strategy. Possible values are:

‘kfold’
‘stratified kfold’
‘leave-one out’
‘shuffle split’

n_iter_cv: int, default = 5

Number of iteration for cross validation model selection. The higher the number the longer the processing time.

optimize: bool, default = False

When set to True, a model is applicable for optimization strategies.

optimizer_method: str, default = ‘grid_search’

When optimize is set to True, allows for hyperparameter optimization method to train the model with best set of parameters for the estimator.

Note

Optimization may not always result in best results.

n_jobs: int, default = 3: The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor.
turbo_mode: bool, default = True: When set to False, another iteration of optimization is carred out inorder to avoid overfitting the model.
fit_arguments: dict, default = None: Allows users to pass parameters for any of the selected estimator provided that parameter is accepted by the function. Accepts only dictionary with keys as parameter name and value as parameter value.
leader_board: bool, default = True: Only available in Auto ML use case. When set to True, prints a set of estimators and their scores.

Returns: Tuple of trained model ID, trained model function, test scores and leaderboard.

Warning

Changing turbo to False may result in very high training times.
For multi-class classification only accuracy, balanced accuracy and f1_weighted values are accepted, else it will result in error.

predict(load_model: bool = False, model_name: Optional[str] = None, prediction_set: Optional[Any] = None)

This function predicts new data based on trained model or on provided trained model from load model function.

Example

from TwinAPI.SimuLearn.MLibrary import AutoML
model_auto = AutoML()
prediction = model_auto.predict(load_model= True, model_name= 'trained_model', [1])

load_model: bool, default = False: When set to True, the function searches for a trained model provided by user
model_name: Optional[str], default = None: If load_model is set to True, Model name for the loading model is accepted.
prediction_set: Optional[Any] = None, default = None: Prediction dataset, integer or a list of values to be predicted

Returns: List of prediction.

modelsave(model_name: Optional[Union[int, str]] = None)

This function saves the trained model.

Example

from TwinAPI.SimuLearn.MLibrary import AutoML
model_auto = AutoML()
model_auto.modelsave('trained_model')

model_name: Optional[Union[int, str]], default = None: Name for the trained model.

Returns: Json file.