rain.nodes.sklearn package#

Submodules#

rain.nodes.sklearn.cluster module#

Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.cluster.SimpleKMeans(node_id: str, execute: list, n_clusters: int = 8)[source]#

Bases: SklearnClusterer

A clusterer for the sklearn KMeans that uses the ‘sklearn.cluster.KMeans’.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.
labels (pandas.DataFrame) – Labels of each point. It corresponds to the ‘labels_’ attribute of the sklearn KMeans.

Parameters:

node_id (str) – Id of the node.
execute ([fit, predict, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
n_clusters (int) – The number of clusters to form as well as the number of centroids to generate.

dataset = None#

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

fitted_model = None#

labels = None#

predictions = None#

score_targets = None#

score_value = None#

transformed_dataset = None#

rain.nodes.sklearn.decomposition module#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.decomposition.SklearnPCA(node_id: str, execute: list, n_components=None, *, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None)[source]#

Bases: SklearnEstimator, ScorerMixin, TransformerMixin

Node representation of a sklearn PCA estimator that uses the ‘sklearn.decomposition.PCA’.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.

Parameters:

execute ([fit, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
n_components (int) – Number of components to keep.
whiten (bool) – When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
svd_solver ({auto, full, arpack, randomized}, default=auto) – Svd solver.
tol (float) – Tolerance for singular values computed by svd_solver == ‘arpack’. Must be positive.
iterated_power (int) – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be positive.
random_state (int) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.

dataset = None#

fitted_model = None#

score_targets = None#

score_value = None#

transformed_dataset = None#

rain.nodes.sklearn.functions module#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.functions.DaviesBouldinScore(node_id: str)[source]#

Bases: SklearnFunction

Computes the Davies-Bouldin score using the ‘sklearn.metrics.davies_bouldin_score’. The score is defined as the average similarity measure of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. Thus, clusters which are farther apart and less dispersed will result in a better score. The minimum score is zero, with lower values indicating better clustering.

Input:

samples_dataset (pandas.DataFrame) – The dataset containing the samples.
labels (pandas.DataFrame) – The dataset containing the target labels.

Output:

score (float) – The davies boulding score value.

Parameters:

node_id (str) – Id of the node.

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

labels = None#

samples_dataset = None#

score = None#

class rain.nodes.sklearn.functions.TrainTestDatasetSplit(node_id: str, test_size: Optional[float] = None, train_size: Optional[float] = None, random_state: Optional[int] = None, shuffle: bool = True)[source]#

Bases: SklearnFunction

Node that uses the ‘sklearn.model_selection.train_test_split’ to split a dataset in two parts.

Input:

dataset (pandas.DataFrame) – The dataset to split.

Output:

train_dataset (pandas.DataFrame) – The training dataset.
test_dataset (pandas.DataFrame) – The test dataset.

Parameters:

node_id (str) – Id of the node.
test_size (float, default=None) – The size as percentage of the test dataset (e.g. 0.3 is 30%).
train_size (float, default=None) – The size as percentage of the train dataset (e.g. 0.7 is 70%)
random_state (int, default=None) – Seed for the random generation.
shuffle (bool, default=True) – Whether to shuffle the dataset before the splitting.

dataset = None#

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

test_dataset = None#

train_dataset = None#

class rain.nodes.sklearn.functions.TrainTestSampleTargetSplit(node_id: str, test_size: Optional[float] = None, train_size: Optional[float] = None, random_state: Optional[int] = None, shuffle: bool = True)[source]#

Bases: SklearnFunction

Node that uses the ‘sklearn.model_selection.train_test_split’ to split two datasets in four parts. It is useful for classification where you have to split equally the sample and the target datasets.

Input:

sample_dataset (pandas.DataFrame) – The dataset containing the samples.
target_dataset (pandas.DataFrame) – The dataset containing the target labels.

Output:

sample_train_dataset (pandas.DataFrame) – The training dataset containing the samples.
sample_test_dataset (pandas.DataFrame) – The test dataset containing the samples.
target_train_dataset (pandas.DataFrame) – The training dataset containing the target labels.
target_test_dataset (pandas.DataFrame) – The test dataset containing the target labels.

Parameters:

node_id (str) – Id of the node.
test_size (float, default=None) – The size as percentage of the test dataset (e.g. 0.3 is 30%).
train_size (float, default=None) – The size as percentage of the train dataset (e.g. 0.7 is 70%)
random_state (int, default=None) – Seed for the random generation.
shuffle (bool, default=True) – Whether to shuffle the dataset before the splitting.

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

sample_dataset = None#

sample_test_dataset = None#

sample_train_dataset = None#

target_dataset = None#

target_test_dataset = None#

target_train_dataset = None#

rain.nodes.sklearn.loaders module#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.loaders.IrisDatasetLoader(node_id: str, separate_target: bool = False)[source]#

Bases: InputNode

Loads the iris dataset as a pandas DataFrame using the ‘sklearn.datasets.load_iris’.

Output:

dataset (pandas.DataFrame) – The iris dataset.
target (pandas.DataFrame) – If separate_target is enabled then it will contain the target labels for the iris dataset.

Parameters:

node_id (str) – Id of the node.
separate_target (bool, default=False) – Whether to get the target labels in the separated output ‘target’.

dataset = None#

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

target = None#

rain.nodes.sklearn.node_structure module#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.node_structure.PredictorMixin[source]#

Bases: object

Mixin class to add a prediction functionality to an estimator.

predict()[source]#

class rain.nodes.sklearn.node_structure.ScorerMixin[source]#

Bases: object

Mixin class to add a scoring functionality to an estimator.

score()[source]#

class rain.nodes.sklearn.node_structure.SklearnClassifier(node_id: str, execute: list)[source]#

Bases: SklearnEstimator, PredictorMixin, ScorerMixin

Base class for all the nodes that use an sklearn classifier.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
fit_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the fit of the classifier.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.

Parameters:

node_id (str) – Id of the node.
execute ([fit, predict, score]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.

dataset = None#

fit()[source]#

fit_targets = None#

fitted_model = None#

predictions = None#

score_targets = None#

score_value = None#

class rain.nodes.sklearn.node_structure.SklearnClusterer(node_id: str, execute: list)[source]#

Bases: SklearnEstimator, PredictorMixin, ScorerMixin, TransformerMixin

Base class for all the nodes that use an sklearn clusterer.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.

Parameters:

node_id (str) – Id of the node.
execute ([fit, predict, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.

dataset = None#

fitted_model = None#

predictions = None#

score_targets = None#

score_value = None#

transformed_dataset = None#

class rain.nodes.sklearn.node_structure.SklearnEstimator(node_id: str, execute: list)[source]#

Bases: SklearnNode

Base class for all the nodes that use an sklearn Estimator.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset that will be used to perform the different methods on.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.

Parameters:

node_id (str) – Id of the node.
execute ([fit]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.

dataset = None#

execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

fit()[source]#

fitted_model = None#

class rain.nodes.sklearn.node_structure.SklearnFunction(node_id: str)[source]#

Bases: SklearnNode

Base class for all the nodes that use an sklearn function.

abstract execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

class rain.nodes.sklearn.node_structure.SklearnNode(node_id)[source]#

Bases: ComputationalNode

Base class for all the nodes that use the sklearn library.

abstract execute()[source]#: Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

class rain.nodes.sklearn.node_structure.TransformerMixin[source]#

Bases: object

Mixin class to add a transformer functionality to an estimator.

transform()[source]#

rain.nodes.sklearn.svm module#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.sklearn.svm.SklearnLinearSVC(node_id: str, execute: list, penalty: str = 'l2', loss: str = 'squared_hinge', dual: bool = True, tol: float = 0.0001, C: float = 1.0, multi_class: str = 'ovr', fit_intercept: bool = True, intercept_scaling: int = 1, class_weight: Optional[float] = None, verbose: int = 0, random_state: Optional[str] = None, max_iter: int = 1000)[source]#

Bases: SklearnClassifier

Node that uses the ‘sklearn.svm.LinearSVC’ classifier.

Input:

fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
fit_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the fit of the classifier.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.

Output:

fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.

Parameters:

node_id (str) – Id of the node.
execute ([fit, predict, score]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
penalty (str, default='l2') – Penalty.
loss (str, default='squared_hinge',) – Loss.
dual (bool, default='True',) – Dual.
tol (float, default='0.0001',) – Tol.
C (float, default='1.0',) –
multi_class (str, default='ovr',) – Multi_class.
fit_intercept (bool, default='True',) – Fit_intercept.
intercept_scaling (int, default='1',) – Intercept_scaling.
class_weight (float, default='None',) – Class_weight.
verbose (int, default='0',) – Verbose.
random_state (str, default='None',) – Random_state.
max_iter (int, default='1000',) – Max_iter.

dataset = None#

fit_targets = None#

fitted_model = None#

predictions = None#

score_targets = None#

score_value = None#

Module contents#

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.