Auto ML
- class TwinAPI.SimuLearn.MLibrary.AutoML
An AUTOML class is a culmination of both regression and classification problems mixed within. It is an easy-to-use automatic Machine Learning library.
- setup(data: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame, str]] = None, x_features: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, y_labels: Optional[Union[list, numpy.ndarray, pandas.core.frame.DataFrame]] = None, verbose: int = 0, header: bool = False)
This function trains the model for a given set of parameters. Incase of Classification and Regression
user_model
is required while all other parameters are optional. Few parameters are subset to specific ML case.Example
1from TwinAPI.SimuLearn.MLibrary import AutoML 2model_auto = AutoML() 3exp_x, exp_y = model_auto.setup(data = 'input.csv', x_features = '[0:5, 4, x1:x5]', y_labels = '[12]')
- data: Union[list, np.ndarray, pd.DataFrame, str], default = None
Input dataset for the training experiment. It takes list, numpy nd array or pandas dataFrame as input. User can also provide a ‘csv’ or ‘xls’ or ‘xlsx’as an input string. Generally a ‘csv’ file or a dataframe works better.
- x_features: Union[list, np.ndarray, pd.DataFrame], default = None
Name of the input x features. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4, x1:x5]’.
- y_labels: Union[list, np.ndarray, pd.DataFrame], default = None
Name of the output y labels. Incase the data parameter is set to None, user can provide with list, numpy nd array or a pandas DataFrame as input. Incase the data parameter is set to pandas DataFrame ora ‘csv’ file then this parameter takes column names or column indices as input provided in the form of lists. Case should match the original dataset. Example: ‘[0:5, 4,y1:y5]’.
- verbose: int, default = 0
Verbosity of the results. Ranges from 0 to 2 and accepts integer values.
‘0’ value provides with only training model name and prediction score.
‘1’ value provides with training model name and different prediction scores.
‘2’ value provides with a json file with all the above information, defaults to 0
- header: bool, default = False
When set to False, the input header is ignored else it will be removed. Generally setting to False it better as the model automatically removes it when training.
- Returns
Tuple of selected X and Y values as dictionary.
- train(user_mlcase: Optional[str] = None, include: Optional[Union[list, str]] = None, exclude: Optional[Union[list, str]] = None, n_jobs: int = 3, turbo_mode: bool = True, optimize: bool = False, split_mode: str = 'kfold split', test_size: float = 0.25, n_iter_cv: str = 5, cross_validation_mode: str = 'kfold', optimizer_method: str = 'grid_search', scoring_method: str = 'auto', leader_board: bool = False)
This function trains the model for a given set of parameters.
Example
1from TwinAPI.SimuLearn.MLibrary import AutoML 2model_auto = AutoML() 3model_auto.train()
- user_mlcase: Optional[str], default = None
Incase the value is not given, the code automatically calculates for ML case
- scoring_method: str, default = ‘auto’
Scoring methodology to testing prediction scores. Follows the sklearn scorer terminology. Accepted values are:
‘auto’
‘accuracy’
‘roc_auc’
‘recall’
‘precision’
‘f1’
‘balanced_accuracy’
‘f1_weighted’
‘r2’
‘neg_mean_absolute_error’
‘neg_mean_squared_error’
‘neg_mean_absolute_percentage_error’
- split_mode: str, default = ‘kfold split’
Selection of train and test data for training model.
‘test-train split’
‘kfold split’
- test_size: float, default = 0.25
Test size for test-train split. Divides train and test in ratio of selected value. Example: train:test = 0.75:0.25.
- cross_validation_mode: str, default = ‘kfold
Choice of cross validation strategy. Possible values are:
‘kfold’
‘stratified kfold’
‘leave-one out’
‘shuffle split’
- n_iter_cv: int, default = 5
Number of iteration for cross validation model selection. The higher the number the longer the processing time.
- optimize: bool, default = False
When set to True, a model is applicable for optimization strategies.
- optimizer_method: str, default = ‘grid_search’
When
optimize
is set to True, allows for hyperparameter optimization method to train the model with best set of parameters for the estimator.
Note
Optimization may not always result in best results.
- n_jobs: int, default = 3
The number of jobs to run in parallel (for functions that supports parallel processing) -1 means using all processors. To run all functions on single processor.
- turbo_mode: bool, default = True
When set to False, another iteration of optimization is carred out inorder to avoid overfitting the model.
- fit_arguments: dict, default = None
Allows users to pass parameters for any of the selected estimator provided that parameter is accepted by the function. Accepts only dictionary with keys as parameter name and value as parameter value.
- leader_board: bool, default = True
Only available in Auto ML use case. When set to True, prints a set of estimators and their scores.
- Returns
Tuple of trained model ID, trained model function, test scores and leaderboard.
Warning
Changing
turbo
to False may result in very high training times.For multi-class classification only accuracy, balanced accuracy and f1_weighted values are accepted, else it will result in error.
- predict(load_model: bool = False, model_name: Optional[str] = None, prediction_set: Optional[Any] = None)
This function predicts new data based on trained model or on provided trained model from load model function.
Example
1from TwinAPI.SimuLearn.MLibrary import AutoML 2model_auto = AutoML() 3prediction = model_auto.predict(load_model= True, model_name= 'trained_model', [1])
- load_model: bool, default = False
When set to True, the function searches for a trained model provided by user
- model_name: Optional[str], default = None
If
load_model
is set to True, Model name for the loading model is accepted.- prediction_set: Optional[Any] = None, default = None
Prediction dataset, integer or a list of values to be predicted
- Returns
List of prediction.
- modelsave(model_name: Optional[Union[int, str]] = None)
This function saves the trained model.
Example
1from TwinAPI.SimuLearn.MLibrary import AutoML 2model_auto = AutoML() 3model_auto.modelsave('trained_model')
- model_name: Optional[Union[int, str]], default = None
Name for the trained model.
- Returns
Json file.