Auto ML

class TwinAPI.SimuLearn.MLibrary.AutoML

An AUTOML class is a culmination of both regression and classification problems mixed within. It is an easy-to-use automatic Machine Learning library.

setup(data: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame, str]] = None, x_features: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, y_labels: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, verbose: int = 0, header: bool = False)

This function trains the model for a given set of parameters. Incase of Classification and Regression user_model is required while all other parameters are optional. Few parameters are subset to specific ML case.

Example

1from TwinAPI.SimuLearn.MLibrary import AutoML
2model_auto = AutoML()
3exp_x, exp_y = model_auto.setup(data = 'input.csv', x_features = '[0:5, 4, x1:x5]', y_labels = '[12]')
data: Union[list, np.ndarray, pd.DataFrame, str], default = None

Input dataset for the training experiment. It takes list, numpy nd array or pandas dataFrame as input. User can also provide a ‘csv’ or ‘xls’ or ‘xlsx’as an input string. Generally a ‘csv’ file or a dataframe works better.

x_features: Union[list, np.ndarray, pd.DataFrame], default = None

Name of the input x features. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4, x1:x5]’.

y_labels: Union[list, np.ndarray, pd.DataFrame], default = None

Name of the output y labels. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4,y1:y5]’.

verbose: int, default = 0

Verbosity of the results. Ranges from 0 to 2 and accepts integer values.

  • ‘0’ value provides with only training model name and prediction score.

  • ‘1’ value provides with training model name and different prediction scores.

  • ‘2’ value provides with a json file with all the above information, defaults to 0

header: bool, default = False

When set to False, the input header is ignored else it will be removed. Generally setting to False it better as the model automatically removes it when training.

Returns

Tuple of selected X and Y values as dictionary.

train(user_mlcase: Optional[str] = None, include: Optional[Union[list, str]] = None, exclude: Optional[Union[list, str]] = None, n_jobs: int = 3, turbo_mode: bool = True, optimize: bool = False, split_mode: str = 'kfold split', test_size: float = 0.25, n_iter_cv: str = 5, cross_validation_mode: str = 'kfold', optimizer_method: str = 'grid_search', scoring_method: str = 'auto', leader_board: bool = False)

This function trains the model for a given set of parameters.

Example

1from TwinAPI.SimuLearn.MLibrary import AutoML
2model_auto = AutoML()
3model_auto.train()
user_mlcase: Optional[str], default = None

Incase the value is not given, the code automatically calculates for ML case

scoring_method: str, default = ‘auto’

Scoring methodology to testing prediction scores. Follows the sklearn scorer terminology. Accepted values are:

  • ‘auto’

  • ‘accuracy’

  • ‘roc_auc’

  • ‘recall’

  • ‘precision’

  • ‘f1’

  • ‘balanced_accuracy’

  • ‘f1_weighted’

  • ‘r2’

  • ‘neg_mean_absolute_error’

  • ‘neg_mean_squared_error’

  • ‘neg_mean_absolute_percentage_error’

split_mode: str, default = ‘kfold split’

Selection of train and test data for training model.

  • ‘test-train split’

  • ‘kfold split’

test_size: float, default = 0.25

Test size for test-train split. Divides train and test in ratio of selected value. Example: train:test = 0.75:0.25.

cross_validation_mode: str, default = ‘kfold

Choice of cross validation strategy. Possible values are:

  • ‘kfold’

  • ‘stratified kfold’

  • ‘leave-one out’

  • ‘shuffle split’

n_iter_cv: int, default = 5

Number of iteration for cross validation model selection. The higher the number the longer the processing time.

optimize: bool, default = False

When set to True, a model is applicable for optimization strategies.

optimizer_method: str, default = ‘grid_search’

When optimize is set to True, allows for hyperparameter optimization method to train the model with best set of parameters for the estimator.

Note

  • Optimization may not always result in best results.

n_jobs: int, default = 3

The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor.

turbo_mode: bool, default = True

When set to False, another iteration of optimization is carred out inorder to avoid overfitting the model.

fit_arguments: dict, default = None

Allows users to pass parameters for any of the selected estimator provided that parameter is accepted by the function. Accepts only dictionary with keys as parameter name and value as parameter value.

leader_board: bool, default = True

Only available in Auto ML use case. When set to True, prints a set of estimators and their scores.

Returns

Tuple of trained model ID, trained model function, test scores and leaderboard.

Warning

  • Changing turbo to False may result in very high training times.

  • For multi-class classification only accuracy, balanced accuracy and f1_weighted values are accepted, else it will result in error.

predict(load_model: bool = False, model_name: Optional[str] = None, prediction_set: Optional[Any] = None)

This function predicts new data based on trained model or on provided trained model from load model function.

Example

1from TwinAPI.SimuLearn.MLibrary import AutoML
2model_auto = AutoML()
3prediction = model_auto.predict(load_model= True, model_name= 'trained_model', [1])
load_model: bool, default = False

When set to True, the function searches for a trained model provided by user

model_name: Optional[str], default = None

If load_model is set to True, Model name for the loading model is accepted.

prediction_set: Optional[Any] = None, default = None

Prediction dataset, integer or a list of values to be predicted

Returns

List of prediction.

modelsave(model_name: Optional[Union[int, str]] = None)

This function saves the trained model.

Example

1from TwinAPI.SimuLearn.MLibrary import AutoML
2model_auto = AutoML()
3model_auto.modelsave('trained_model')
model_name: Optional[Union[int, str]], default = None

Name for the trained model.

Returns

Json file.