Regression
Regression module.
- remote_sensing_processor.regression.generate_tiles(x, y, output, tile_size=128, shuffle=False, split=None, filter_nodata='x', x_dtype=None, y_dtype=None, x_nodata=None, y_nodata=None)[source]
Cut rasters into tiles.
- Parameters:
x (list of paths as strings) – Rasters to use as training data.
y (dict or list of dicts) – Target variable or multiple target variables. Can be set to None if target value is not needed. Dict or multiple dicts. It should contain: name: a name of a target variable that will be used further to call it. path: raster or vector file to use as target variable. burn_value (optional): a field to use for a burn-in value. Field should be numeric.
output (path as a string (optional)) – Path to save generated output x data. Data is saved in a .rspds format (custom dataset format based on WebDataset.
tile_size (int (default = 128)) – Size of tiles to generate (tile_size x tile_size).
shuffle (bool (default = False)) – Is a random shuffling of samples needed.
split (dict (optional)) – Splitting data in subsets. Is a dict, where keys are the names of split subsets and values are numbers defining proportions of every subset. For example, {“train”: 3, “validation”: 1, “test”: 1} will generate 3 subsets (train, validation, and test) in proportion 3 to 1 to 1.
filter_nodata (str (default = "x")) – How the nodata values should be treated. None: do not filter nodata. “x”: filter out pixels that are nodata in x. “y”: filter out pixels that are nodata in y. “x_or_y”: filter out pixels that are nodata in x or y. “x_and_y”: filter out pixels that are nodata in x and y.
x_dtype (dtype definition as a string (optional)) – If you run out of memory, you can try to convert your data to less memory consuming format.
y_dtype (dtype definition as a string (optional)) – If you run out of memory, you can try to convert your data to less memory consuming format.
x_nodata (int or float (optional)) – You can define which value in x raster corresponds to nodata and areas that contain nodata in x raster will be ignored while training and testing. Tiles that contain only nodata in both x and y will be omitted. If not defined, then the most common nodata value amongst x files will be used. If there are no nodata values, will be set to 0.
y_nodata (int or float (optional)) – You can define which value will be used to fill nodata. If there are polygons with the same value as y_nodata, they will be ignored while training and testing. Tiles that contain only nodata in both x and y will be omitted. If not defined, then it will be set to 0.
- Returns:
Path to the output dataset.
- Return type:
pathlib.Path
Examples
>>> import remote_sensing_processor as rsp >>> x = ["/home/rsp_test/mosaics/sentinel/sentinel.json", "/home/rsp_test/mosaics/dem/dem.tif"] >>> y = [ ... {"name": "nitrogen", "path": "/home/rsp_test/mosaics/nitrogen.tif"}, ... {"name": "phosphorus", "path": "/home/rsp_test/vectors/phosphorus.gpkg", "burn_value": "P"}, ... ] >>> out_file = "/home/rsp_test/model/chem_dataset.rspds" >>> out_dataset = rsp.regression.generate_tiles( ... x, ... y, ... out_file, ... tile_size=256, ... shuffle=True, ... split={"train": 3, "val": 1, "test": 1}, ... ) >>> print(out_dataset) PosixPath('/home/rsp_test/model/chem_dataset.rspds')
- remote_sensing_processor.regression.train(train_datasets, val_datasets, model_file, model, backbone=None, checkpoint=None, weights=None, epochs=None, loss=None, metrics=None, batch_size=32, repeat=1, augment=False, lr=0.001, generate_features=False, num_workers=0, precision=None, **kwargs)[source]
Trains segmentation model.
- Parameters:
train_datasets (dict or list of dicts) – Dataset generated by generate_tiles() function that will be used to train the model. Each dataset can contain 3 elements: path: a path to a dataset. Required parameter. sub: subdataset name, list of subdataset names or ‘all’. Required parameter. y: if there is more than one target variable in dataset, then the name of the variable that should be used for training should be defined. Optional parameter. You can provide a list of datasets to train model on multiple datasets.
val_datasets (dict or list of dicts or None) – Dataset generated by generate_tiles() function that will be used to validate the model. Each dataset can contain 3 elements: path: a path to a dataset. Required parameter. sub: subdataset name, list of subdataset names or ‘all’. Required parameter. y: if there is more than one target variable in dataset, then the name of the variable that should be used for validation should be defined. Optional parameter. You can provide a list of datasets to validate model on multiple datasets. Can be set to None if no validation is needed.
model_file (path as a string) – Checkpoint file where model will be saved after training. File extension must be *.ckpt for neural networks and *.joblib for scikit-learn models.
model (str or torch.nn or sklearn model) – Name of model architecture, pytorch regression model or sklearn regression model.
backbone (str (optional)) – Backbone, solver or kernel of a model, if multiple backbones are supported.
checkpoint (path as a string (optional)) – Checkpoint file (*.ckpt or *.joblib) of a pre-trained model to fine-tune.
weights (str (optional)) – Name of pre-trained weights to fine-tune. Only works for neural networks.
epochs (dict (optional)) – Dict of values that set the number of training epochs and early stopping parameter for Deep Learning models. max_epochs (int): the maximum number of epochs. early_stopping (bool): is early stopping enabled. min_delta (float): minimum change in the monitored quantity to qualify as an improvement. Optional parameter. patience (int): number of epochs with no improvement after which training will be stopped. Optional parameter. If you only want to initialize model for future testing or prediction, set max_epochs to 0. If not set, will use max_epochs = 5 and early_stopping with default parameters. epochs have no effect for Scikit-Learn models. Please, set num_iter, tol and other epochs-related parameters via **kwargs.
loss (str or torch.nn (optional)) – Loss function that will be used during the training. The default one is MSE or default loss for HuggingFace Transformers models. You can use any custom loss function, but it must inherit torch.nn.modules.loss._Loss.
metrics (dict or list of dicts (optional)) – Metrics that will be used to evaluate model performance and logged. Can be a single dict or list of dicts. Each dict corresponds to one metric. name (str): name of a metric. If name is one of supported metrics, it will be automatically loaded and used. log (str): logging levels can be ‘epoch’ - to log the metric only on the end of each epoch, ‘step’ - to log on each training step and ‘verbose’ - to log on each step and show alongside progress bar. metric (Metric): your custom metric object. Optional parameter. You can use any custom metrics, but they must inherit torchmetrics.metric. If not set, accuracy and mean IoU are verbose logged and precision and recall are logged after each epoch.
batch_size (int (default = 32)) – Number of training samples used in one iteration. Only works for neural networks.
repeat (int (default = 1)) – Increase size of a dataset by repeating it n times. Can be useful if dataset is very small.
augment (bool or sequence of str (default = False)) – Apply augmentations to dataset. Only works for neural networks. No augmentations applied if set to False. If set to True then the default augmentations (RandomResizedCrop, RandomHorizontalFlip) are applied. You can pass your own sequence of augmentations, they will be applied to data in the given order. You can use any custom augmentations, but they must inherit torchvision.transforms.v2.Transform.
lr (float (default = 1e-3)) – Learning rate of a model. Lower value results usually in better model convergence, but much slower training. lr have no effect for Scikit-Learn models. Please, set learning_rate_init, alpha and other lr-related parameters via **kwargs.
generate_features (bool (default = False)) – If set to True, intensity, gradient intensity and local structure features will be generated, as described here. Can result in better segmentation quality, but can also significantly increase training time. Only works for scikit-learn models.
num_workers (int or 'auto' (default = 0)) – Number of parallel workers that will load the data. Set ‘auto’ to let RSP choose the optimal number of workers, set 0 to disable multiprocessing. Can increase training speed, but can also cause errors (e.g. pickling errors).
precision (str (optional)) –
Precision that will be used in training process. Lower precision requires less memory, but can sometimes cause errors. More info can be found here
**kwargs – Additional keyword arguments that are used to initialize model. They are different for every model, so read the documentation.
- Returns:
Trained model.
- Return type:
torch.nn model or SklearnModel
Examples
>>> import remote_sensing_processor as rsp >>> x = ["/home/rsp_test/mosaics/sentinel/", "/home/rsp_test/mosaics/dem/dem.tif"] >>> y = [ ... {"name": "nitrogen", "path": "/home/rsp_test/mosaics/nitrogen.tif"}, ... {"name": "phosphorus", "path": "/home/rsp_test/mosaics/phosphorus.tif"}, ... ] >>> out_file = "/home/rsp_test/model/chem_dataset.rspds" >>> dataset_path = rsp.regression.generate_tiles( ... x, ... y, ... out_file, ... tile_size=256, ... shuffle=True, ... split={"train": 3, "val": 1, "test": 1}, ... ) >>> # We will train model to predict nitrogen content >>> train_ds = {"path": dataset_path, "sub": "train", "y": "nitrogen"} >>> val_ds = {"path": dataset_path, "sub": "val", "y": "nitrogen"} >>> model = rsp.regression.train( ... train_ds, ... val_ds, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... epochs={"max_epochs": 100, "early_stopping": False}, ... batch_size=32, ... ) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ----------------------------------------------------------- 0 | model | UperNetForSemanticSegmentation | 59.8 M 1 | loss_fn | CrossEntropyLoss | 0 ----------------------------------------------------------- 59.8 M Trainable params 0 Non-trainable params 59.8 M Total params 239.395 Total estimated model params size (MB) Epoch 9: 100% ############################################# 223/223 [1:56:20<00:00, 31.30s/it, v_num=54, train_loss_step=0.326, train_acc_step=0.871, train_auroc_step=0.796, train_iou_step=0.655, val_loss_step=0.324, val_acc_step=0.869, val_auroc_step=0.620, val_iou_step=0.678, val_loss_epoch=0.334, val_acc_epoch=0.807, val_auroc_epoch=0.795, val_iou_epoch=0.688, train_loss_epoch=0.349, train_acc_epoch=0.842, train_auroc_epoch=0.797, train_iou_epoch=0.648] `Trainer.fit` stopped: `max_epochs=10` reached.
>>> ds_mo = "/home/rsp_test/model/montana.rspds" >>> ds_id = "/home/rsp_test/model/idaho.rspds" >>> # Training on two different datasets - one from Montana and one from Idaho >>> train_ds = [ ... {"path": ds_mo, "sub": ["area_1", "area_2"]}, ... {"path": ds_id, "sub": ["area_3", "area_6", "area8"]}, ... ] >>> val_ds = [ ... {"path": ds_mo, "sub": ["area_3", "area_4"]}, ... {"path": ds_id, "sub": ["area_1"]}, ... ] >>> model = rsp.regression.train( ... train_ds, ... val_ds, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... epochs={"max_epochs": 100, "early_stopping": False}, ... batch_size=32, ... ) GPU available: True (cuda), used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs HPU available: False, using: 0 HPUs LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] | Name | Type | Params ----------------------------------------------------------- 0 | model | UperNetForSemanticSegmentation | 59.8 M 1 | loss_fn | CrossEntropyLoss | 0 ----------------------------------------------------------- 59.8 M Trainable params 0 Non-trainable params 59.8 M Total params 239.395 Total estimated model params size (MB) Epoch 99: 100% ############################################# 223/223 [1:56:20<00:00, 31.30s/it, v_num=54, train_loss_step=0.326, train_acc_step=0.871, train_auroc_step=0.796, train_iou_step=0.655, val_loss_step=0.324, val_acc_step=0.869, val_auroc_step=0.620, val_iou_step=0.678, val_loss_epoch=0.334, val_acc_epoch=0.807, val_auroc_epoch=0.795, val_iou_epoch=0.688, train_loss_epoch=0.349, train_acc_epoch=0.842, train_auroc_epoch=0.797, train_iou_epoch=0.648] `Trainer.fit` stopped: `max_epochs=100` reached.
- remote_sensing_processor.regression.test(test_datasets, model, metrics=None, batch_size=32, num_workers=0)[source]
Tests segmentation model.
- Parameters:
test_datasets (dict or list of dicts) – Dataset generated by generate_tiles() function that will be used to test the model. Each dataset can contain 3 elements: path (path as str): a path to a dataset. Required parameter. sub (str): subdataset name, list of subdataset names or ‘all’. Required parameter. y (str): if there is more than one target variable in dataset, then the name of the variable that should be used for testing should be defined. Optional parameter. You can provide a list of datasets to test model on multiple datasets.
model (torch.nn model or SklearnModel or path to a model file) – Model to test. You can pass the model object returned by train() function or file (*.ckpt or *.joblib) where model is stored.
metrics (dict or list of dicts (optional)) – Metrics that will be used to evaluate model performance and logged. Can be a single dict or list of dicts. Each dict corresponds to one metric. name (str): name of a metric. If name is one of supported metrics, it will be automatically loaded and used. log (str): logging levels can be ‘epoch’ - to log the metric only on the end of each epoch, ‘step’ - to log on each training step and ‘verbose’ - to log on each step and show alongside progress bar. metric (Metric): your custom metric object. Optional parameter. You can use any custom metrics, but they must inherit torchmetrics.metric. If not set, will evaluate the metrics used in training process.
batch_size (int (default = 32)) – Number of samples used in one iteration. Only works for neural networks.
num_workers (int or 'auto' (default = 0)) – Number of parallel workers that will load the data. Set ‘auto’ to let RSP choose the optimal number of workers, set 0 to disable multiprocessing. Can increase training speed, but can also cause errors (e.g. pickling errors).
Examples
>>> import remote_sensing_processor as rsp >>> x, y, out_file = ... >>> ds = rsp.regression.generate_tiles( ... x, ... y, ... out_file, ... tile_size=256, ... shuffle=True, ... split={"train": 3, "val": 1, "test": 1}, ... ) >>> model = rsp.regression.train( ... {"path": ds, "sub": "train"}, ... {"path": ds, "sub": "val"}, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... batch_size=32, ... ) >>> rsp.regression.test({"path": ds, "sub": "test"}, model=model, batch_size=32) ┏━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Test metric ┃ DataLoader 0 ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━┩ │ test_acc_epoch │ 0.8231202960014343 │ │ test_auroc_epoch │ 0.7588028311729431 │ │ test_iou_epoch │ 0.69323649406433105 │ │ test_loss_epoch │ 0.40799811482429504 │ │ test_precision_epoch │ 0.8231202960014343 │ │ test_recall_epoch │ 0.8231202960014343 │ └───────────────────────────┴───────────────────────────┘
- remote_sensing_processor.regression.generate_map(dataset, model, output, reference_dataset=None, batch_size=32, num_workers=0, write_stac=True)[source]
Create a map using pre-trained model.
- Parameters:
dataset (dict) – Dataset generated by generate_tiles() function that will be used for prediction. Dataset can contain 3 elements: path (path as str): a path to a dataset. Required parameter. sub (str): subdataset name, list of subdataset names or ‘all’. Optional parameter. If not defined, prediction for the whole dataset will be performed. y (str): if there is more than one target variable in dataset, then the name of the variable that should be used for original data reconstruction should be defined. Optional parameter.
model (torch.nn model or SklearnModel or path to a model file) – Pre-trained model to predict target values. You can pass the model object returned by train() function or file (*.ckpt or *.joblib) where model is stored.
output (path as a string) – Path where to write an output map.
reference_dataset (path as a string (optional)) – Dataset generated by generate_tiles() function that will be used to reconstruct original class values and nodata if prediction dataset has no target variable (‘y’). Dataset can contain 2 elements: path: a path to a dataset. y: if there is more than one target variable in dataset, then the name of the variable that should be used for reconstruction should be defined.
batch_size (int (default = 32)) – Number of samples used in one iteration. Only works for neural networks.
num_workers (int or 'auto' (default = 0)) – Number of parallel workers that will load the data. Set ‘auto’ to let RSP choose the optimal number of workers, set 0 to disable multiprocessing. It can increase training speed, but can also cause errors (e.g., pickling errors).
write_stac (bool (default = True)) – If True, then output metadata is saved to a STAC file.
- Returns:
Path where output raster is saved.
- Return type:
pathlib.Path
Examples
>>> import remote_sensing_processor as rsp >>> x, y, out_file = ... >>> ds = rsp.regression.generate_tiles( ... x, ... y, ... out_file, ... tile_size=256, ... shuffle=True, ... split={"train": 3, "val": 1, "test": 1}, ... ) >>> model = rsp.regression.train( ... {"path": ds, "sub": "train"}, ... {"path": ds, "sub": "val"}, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... batch_size=32, ... ) >>> output_map = "/home/rsp_test/prediction.tif" >>> rsp.regression.generate_map({"path": ds, "y": "nitrogen"}, model, output_map) Predicting: 100% #################### 372/372 [32:16, 1.6s/it]
>>> ds = {"path": "/home/rsp_test/model/ds.rspds"} >>> model = "/home/rsp_test/model/upernet.ckpt" >>> output_map = "/home/rsp_test/prediction.tif" >>> rsp.regression.generate_map(ds, model, output_map) Predicting: 100% #################### 372/372 [32:16, 1.6s/it]
>>> # Train model on data from Montana >>> x_montana_files = "/home/rsp_test/mosaics/landsat_montana/landsat.json" >>> y_montana_files = {"name": "nitrogen", "path": "/home/rsp_test/mosaics/chem_montana/nitrogen.tif"} >>> ds_montana = rsp.regression.generate_tiles( ... x_montana_files, y_montana_files, tile_size=256, shuffle=True, split={"train": 3, "val": 1, "test": 1} ... ) >>> train_ds = {"path": ds_montana, "sub": "train"} >>> val_ds = {"path": ds_montana, "sub": "val"} >>> model_montana = rsp.regression.train( ... train_ds, ... val_ds, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... epochs={"max_epochs": 10, "early_stopping": False}, ... batch_size=32, ... ) >>> # Use model to map crop nitrogen content in Idaho >>> x_idaho_files = "/home/rsp_test/mosaics/landsat_idaho/landsat.json" >>> ds_idaho = rsp.regression.generate_tiles(x_idaho_files, None, tile_size=256) >>> output_map = "/home/rsp_test/prediction_idaho.tif" >>> pred_ds = {"path": ds_idaho} >>> ref_ds = {"path": ds_montana, "y": "nitrogen"} >>> rsp.regression.generate_map(pred_ds, model_montana, output_map, reference_dataset=ref_ds) Predicting: 100% #################### 372/372 [32:16, 1.6s/it]
- remote_sensing_processor.regression.band_importance(dataset, model, num_init_images=100, num_images=500, batch_size=32, num_workers=0)[source]
Explain the band importance for a pre-trained model using SHAP.
- Parameters:
dataset (dict) – Dataset generated by generate_tiles() function that will be used for prediction. Dataset can contain 3 elements: path: a path to a dataset. sub: subdataset name, list of subdataset names or ‘all’. If not defined, prediction for the whole dataset will be performed. y: if there is more than one target variable in dataset, then the name of the variable that should be used for original data reconstruction should be defined.
model (torch.nn model or SklearnModel or path to a model file) – Pre-trained model to predict target values. You can pass the model object returned by train() function or file (*.ckpt or *.joblib) where model is stored.
num_init_images (int (default = 100)) – Number of images that will be used to initialize the SHAP explainer. We strongly recommend using very small num_init_images with sklearn models.
num_images (int (default = 500)) – Number of images that will be used to explain band importance. We strongly recommend using very small num_images with sklearn models.
batch_size (int (default = 32)) – Number of samples used in one iteration. Only works for neural networks.
num_workers (int or 'auto' (default = 0)) – Number of parallel workers that will load the data. Set ‘auto’ to let RSP choose the optimal number of workers, set 0 to disable multiprocessing. It can increase training speed, but can also cause errors (e.g. pickling errors).
Examples
>>> import remote_sensing_processor as rsp >>> x, y, out_file = ... >>> ds = rsp.regression.generate_tiles( ... x, ... y, ... out_file, ... tile_size=256, ... shuffle=True, ... split={"train": 3, "val": 1, "test": 1}, ... ) >>> model = rsp.regression.train( ... {"path": ds, "sub": "train"}, ... {"path": ds, "sub": "val"}, ... model="UperNet", ... backbone="ConvNeXTV2", ... model_file="/home/rsp_test/model/upernet.ckpt", ... batch_size=32, ... ) >>> rsp.regression.band_importance({"path": ds, "y": "nitrogen"}, model) PartitionExplainer explainer: 100%|██████████████████████████████████████████▉| 499/500 [42:07<00:05, 5.07s/it] Landsat-B1: 0.0162 Landsat-B2: 0.0493 Landsat-B3: 0.0875 Landsat-B4: 0.0243 Landsat-B5: 0.0319 Landsat-B7: 0.0194 NDVI: 0.0353 NBR: 0.0281 slope: 0.0134 curvature: 0.0239 aspect: 0.0311 dem-norm: 0.0236
>>> ds = {"path": "/home/rsp_test/model/ds.rspds"} >>> model = "/home/rsp_test/model/xgboost.joblib" >>> rsp.regression.band_importance(ds, model, num_init_images=1, num_images=1) PartitionExplainer explainer: 16385it [1:12:01, 3.78it/s] coastal: 0.0266 blue: 0.0266 green: 0.0253 red: 0.0542 rededge071: 0.0194 rededge075: 0.0194 rededge078: 0.0196 nir: 0.0034 nir08: 0.0111 nir09: 0.0758 swir16: 0.2894 swir22: 0.0309 NDVI: 0.0483 canopyheight_norm: 0.0005 dem_norm: 0.0741
List of available NN models
Model Name |
Backbone |
Reference |
|---|---|---|
BEiT |
Not available |
|
Data2Vec |
Not available |
|
DPT |
Not available |
|
MobileNetV2 |
Not available |
|
MobileViT |
Not available |
|
MobileViTV2 |
Not available |
|
SegFormer |
Not available |
|
UperNet |
See Transformers backbones |
|
DeepLabV3 |
“MobileNet_V3_Large”, “ResNet50”, “ResNet101” |
|
FCN |
“ResNet50”, “ResNet101” |
|
LRASPP |
Not available |
|
UNet |
||
UNet++ |
||
FPN |
||
PSPNet |
||
DeepLabV3_smp |
||
DeepLabV3+ |
||
Linknet |
||
MAnet |
||
PAN |
||
UperNet_smp |
||
SegFormer_smp |
||
DPT_smp |
||
FarSeg |
“resnet18”, “resnet34”, “resnet50”, “resnet101” |
Transformers backbones are:
BEiT
BiT
ConvNeXT
ConvNeXTV2
DiNAT
DINOV2
DINOV2WithRegisters
DINOV3ViT
DINOV3ConvNeXT
FocalNet
HGNet-V2
Hiera
LW-DETR
MaskFormer-Swin
Pixio
PVTV2
ResNet
RT-DETR-ResNet
Swin
SwinV2
ViTDet
Any TIMM backbone (experimental support)
You can fine-tune pre-trained model by defining weights. For models from Transformers you can get available weights from Huggingface Hub, for Torchvision models you just set weights = True.
rsp.segmentation.train also saves CSV and Tensorboard logs in directory where checkpoint file is saved.
DiNAT backbone require natten library, that is not available on Windows and Mac and not available via Conda. RSP supports DiNAT backbone, but you need to install natten in your python env manually.
List of available Scikit-learn models
Model Name |
Kernel/solver |
Warm start |
Reference |
|---|---|---|---|
Linear Regression |
Not available |
Not supported |
|
Ridge |
Not available |
Not supported |
|
Bayesian Ridge |
Not available |
Not supported |
|
Lasso |
Not available |
Supported |
|
Multitask Lasso |
Not available |
Supported |
|
Lars |
Not available |
Not supported |
|
LassoLars |
Not available |
Not supported |
|
LassoLarsIC |
Not available |
Not supported |
|
ElasticNet |
Not available |
Not supported |
|
Multitask ElasticNet |
Not available |
Supported |
|
Orthogonal Matching Pursuit |
Not available |
Not supported |
|
ARD |
Not available |
Not supported |
|
Huber |
Not available |
Supported |
|
RANSAC |
Not available |
Not supported |
|
Theil-Sen |
Not available |
Not supported |
|
Gamma |
“lbfgs”, “newton-cholesky” |
Supported |
|
Poisson |
“lbfgs”, “newton-cholesky” |
Supported |
|
Tweedie |
“lbfgs”, “newton-cholesky” |
Supported |
|
SGD |
“squared_error”, “huber”, “epsilon_insensitive”, “squared_epsilon_insensitive” |
Supported |
|
Nearest Neighbors |
Not available |
Not supported |
|
Radius Neighbors |
Not available |
Not supported |
|
SVM |
“rbf”, “linear”, “poly”, “sigmoid” |
Not supported |
|
Gaussian Process |
Not available |
Not supported |
|
Decision Tree |
Not available |
Not supported |
|
Extra Tree |
Not available |
Not supported |
|
Random Forest |
Not available |
Supported |
|
Extra Trees |
Not available |
Supported |
|
AdaBoost |
Not available |
Not supported |
|
Gradient Boosting |
Not available |
Supported |
|
Multilayer Perceptron |
“adam”, “sgd”, “lbfgs” |
Supported |
|
XGBoost |
Not available |
Not supported |
|
XGB Random Forest |
Not available |
Not supported |
Model kernel or solver is defined with backbone argument.
Models that support warm start can be fine-tuned using pre-trained models with checkpoint argument.
Some models can have issues while saving, especially when trained on big datasets. Some models (like SVM) can train for a very long time or (like Gaussian process) can have memory issues with big datasets. So we recommend using Scikit-learn models only for small datasets.
For Random Forest and Extra Trees models max_depth is by default set to 6, because it is unlimited by default and the training could be very slow. To train with unlimited tree depth set max_depth = None.
List of available losses
Loss |
Reference |
|---|---|
mse |
|
mae |
You can also use your custom loss. It can be useful if you want to initialize a loss with custom parameters.
You also can pass any custom function as a loss. The only limit - it must inherit torch.nn.modules.loss._Loss.
List of available metrics
Metric |
Additional parameters |
Reference |
|---|---|---|
concordance_correlation_coefficient |
None |
|
cosine_similarity |
None |
|
critical_success_index |
threshold=0.5 |
|
explained_variance |
None |
|
kendall_rank_correlation_coefficient_a |
variant=”a” |
|
kendall_rank_correlation_coefficient_b |
variant=”b” |
|
kendall_rank_correlation_coefficient_c |
variant=”c” |
|
kl_divergence |
None |
|
log_cosh_error |
None |
|
mae |
None |
|
mape |
None |
|
mse |
squared=True |
|
msle |
None |
|
manhattan_distance |
p=1 |
|
euclidean_distance |
p=2 |
|
minkowski_distance_3 |
p=1 |
|
minkowski_distance_10 |
p=10 |
|
minkowski_distance_100 |
p=100 |
|
nrmse |
None |
|
pearson_correlation_coefficient |
None |
|
r2 |
None |
|
rse |
squared=True |
|
rmse |
squared=False |
|
rrse |
squared=False |
|
spearman_correlation_coefficient |
None |
|
smape |
None |
|
tweedie_deviance_score |
None |
|
weighted_mape |
None |
You also can use any custom metric for evaluation. The only limit - it must inherit torchmetrics.metric.Metric.
Supported augmentations
Augmentation |
Additional parameters |
Reference |
|---|---|---|
ScaleJitter |
None |
|
RandomResizedCrop |
antialias=True |
|
RandomHorizontalFlip |
p=0.5 |
|
RandomVerticalFlip |
p=0.5 |
|
RandomZoomOut |
None |
|
RandomRotation |
degrees=90 |
|
RandomAffine |
degrees=90, translate=(0.5, 0.5), shear=0.5 |
|
RandomPerspective |
None |
|
ElasticTransform |
None |
|
GaussianBlur |
kernel_size=(5, 9) |
If you just pass augment=True, RSP will use a default sequence of augmentations: (“RandomResizedCrop”, “RandomHorizontalFlip”).
You can pass your own sequence of augmentations, they will be applied to data in the given order.
You can use both supported augmentation names or custom augmentations.
You can use any custom augmentations, but they must inherit torchvision.transforms.v2.Transform.