rain.nodes.sklearn package#
Submodules#
rain.nodes.sklearn.cluster module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.cluster.SimpleKMeans(node_id: str, execute: list, n_clusters: int = 8)[source]#
Bases:
SklearnClustererA clusterer for the sklearn KMeans that uses the ‘sklearn.cluster.KMeans’.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.
labels (pandas.DataFrame) – Labels of each point. It corresponds to the ‘labels_’ attribute of the sklearn KMeans.
- Parameters:
node_id (str) – Id of the node.
execute ([fit, predict, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
n_clusters (int) – The number of clusters to form as well as the number of centroids to generate.
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- fitted_model = None#
- labels = None#
- predictions = None#
- score_targets = None#
- score_value = None#
- transformed_dataset = None#
rain.nodes.sklearn.decomposition module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.decomposition.SklearnPCA(node_id: str, execute: list, n_components=None, *, whiten=False, svd_solver='auto', tol=0.0, iterated_power='auto', random_state=None)[source]#
Bases:
SklearnEstimator,ScorerMixin,TransformerMixinNode representation of a sklearn PCA estimator that uses the ‘sklearn.decomposition.PCA’.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.
- Parameters:
execute ([fit, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
n_components (int) – Number of components to keep.
whiten (bool) – When True (False by default) the components_ vectors are multiplied by the square root of n_samples and then divided by the singular values to ensure uncorrelated outputs with unit component-wise variances.
svd_solver ({auto, full, arpack, randomized}, default=auto) – Svd solver.
tol (float) – Tolerance for singular values computed by svd_solver == ‘arpack’. Must be positive.
iterated_power (int) – Number of iterations for the power method computed by svd_solver == ‘randomized’. Must be positive.
random_state (int) – Used when the ‘arpack’ or ‘randomized’ solvers are used. Pass an int for reproducible results across multiple function calls.
- dataset = None#
- fitted_model = None#
- score_targets = None#
- score_value = None#
- transformed_dataset = None#
rain.nodes.sklearn.functions module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.functions.DaviesBouldinScore(node_id: str)[source]#
Bases:
SklearnFunctionComputes the Davies-Bouldin score using the ‘sklearn.metrics.davies_bouldin_score’. The score is defined as the average similarity measure of each cluster with its most similar cluster, where similarity is the ratio of within-cluster distances to between-cluster distances. Thus, clusters which are farther apart and less dispersed will result in a better score. The minimum score is zero, with lower values indicating better clustering.
- Input:
samples_dataset (pandas.DataFrame) – The dataset containing the samples.
labels (pandas.DataFrame) – The dataset containing the target labels.
- Output:
score (float) – The davies boulding score value.
- Parameters:
node_id (str) – Id of the node.
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- labels = None#
- samples_dataset = None#
- score = None#
- class rain.nodes.sklearn.functions.TrainTestDatasetSplit(node_id: str, test_size: Optional[float] = None, train_size: Optional[float] = None, random_state: Optional[int] = None, shuffle: bool = True)[source]#
Bases:
SklearnFunctionNode that uses the ‘sklearn.model_selection.train_test_split’ to split a dataset in two parts.
- Input:
dataset (pandas.DataFrame) – The dataset to split.
- Output:
train_dataset (pandas.DataFrame) – The training dataset.
test_dataset (pandas.DataFrame) – The test dataset.
- Parameters:
node_id (str) – Id of the node.
test_size (float, default=None) – The size as percentage of the test dataset (e.g. 0.3 is 30%).
train_size (float, default=None) – The size as percentage of the train dataset (e.g. 0.7 is 70%)
random_state (int, default=None) – Seed for the random generation.
shuffle (bool, default=True) – Whether to shuffle the dataset before the splitting.
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- test_dataset = None#
- train_dataset = None#
- class rain.nodes.sklearn.functions.TrainTestSampleTargetSplit(node_id: str, test_size: Optional[float] = None, train_size: Optional[float] = None, random_state: Optional[int] = None, shuffle: bool = True)[source]#
Bases:
SklearnFunctionNode that uses the ‘sklearn.model_selection.train_test_split’ to split two datasets in four parts. It is useful for classification where you have to split equally the sample and the target datasets.
- Input:
sample_dataset (pandas.DataFrame) – The dataset containing the samples.
target_dataset (pandas.DataFrame) – The dataset containing the target labels.
- Output:
sample_train_dataset (pandas.DataFrame) – The training dataset containing the samples.
sample_test_dataset (pandas.DataFrame) – The test dataset containing the samples.
target_train_dataset (pandas.DataFrame) – The training dataset containing the target labels.
target_test_dataset (pandas.DataFrame) – The test dataset containing the target labels.
- Parameters:
node_id (str) – Id of the node.
test_size (float, default=None) – The size as percentage of the test dataset (e.g. 0.3 is 30%).
train_size (float, default=None) – The size as percentage of the train dataset (e.g. 0.7 is 70%)
random_state (int, default=None) – Seed for the random generation.
shuffle (bool, default=True) – Whether to shuffle the dataset before the splitting.
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- sample_dataset = None#
- sample_test_dataset = None#
- sample_train_dataset = None#
- target_dataset = None#
- target_test_dataset = None#
- target_train_dataset = None#
rain.nodes.sklearn.loaders module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.loaders.IrisDatasetLoader(node_id: str, separate_target: bool = False)[source]#
Bases:
InputNodeLoads the iris dataset as a pandas DataFrame using the ‘sklearn.datasets.load_iris’.
- Output:
dataset (pandas.DataFrame) – The iris dataset.
target (pandas.DataFrame) – If separate_target is enabled then it will contain the target labels for the iris dataset.
- Parameters:
node_id (str) – Id of the node.
separate_target (bool, default=False) – Whether to get the target labels in the separated output ‘target’.
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- target = None#
rain.nodes.sklearn.node_structure module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.node_structure.PredictorMixin[source]#
Bases:
objectMixin class to add a prediction functionality to an estimator.
- class rain.nodes.sklearn.node_structure.ScorerMixin[source]#
Bases:
objectMixin class to add a scoring functionality to an estimator.
- class rain.nodes.sklearn.node_structure.SklearnClassifier(node_id: str, execute: list)[source]#
Bases:
SklearnEstimator,PredictorMixin,ScorerMixinBase class for all the nodes that use an sklearn classifier.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
fit_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the fit of the classifier.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
- Parameters:
node_id (str) – Id of the node.
execute ([fit, predict, score]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
- dataset = None#
- fit_targets = None#
- fitted_model = None#
- predictions = None#
- score_targets = None#
- score_value = None#
- class rain.nodes.sklearn.node_structure.SklearnClusterer(node_id: str, execute: list)[source]#
Bases:
SklearnEstimator,PredictorMixin,ScorerMixin,TransformerMixinBase class for all the nodes that use an sklearn clusterer.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
transformed_dataset (pandas.DataFrame) – The dataset that results from the transform.
- Parameters:
node_id (str) – Id of the node.
execute ([fit, predict, score, transform]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
- dataset = None#
- fitted_model = None#
- predictions = None#
- score_targets = None#
- score_value = None#
- transformed_dataset = None#
- class rain.nodes.sklearn.node_structure.SklearnEstimator(node_id: str, execute: list)[source]#
Bases:
SklearnNodeBase class for all the nodes that use an sklearn Estimator.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset that will be used to perform the different methods on.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
- Parameters:
node_id (str) – Id of the node.
execute ([fit]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- fitted_model = None#
- class rain.nodes.sklearn.node_structure.SklearnFunction(node_id: str)[source]#
Bases:
SklearnNodeBase class for all the nodes that use an sklearn function.
- class rain.nodes.sklearn.node_structure.SklearnNode(node_id)[source]#
Bases:
ComputationalNodeBase class for all the nodes that use the sklearn library.
rain.nodes.sklearn.svm module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.sklearn.svm.SklearnLinearSVC(node_id: str, execute: list, penalty: str = 'l2', loss: str = 'squared_hinge', dual: bool = True, tol: float = 0.0001, C: float = 1.0, multi_class: str = 'ovr', fit_intercept: bool = True, intercept_scaling: int = 1, class_weight: Optional[float] = None, verbose: int = 0, random_state: Optional[str] = None, max_iter: int = 1000)[source]#
Bases:
SklearnClassifierNode that uses the ‘sklearn.svm.LinearSVC’ classifier.
- Input:
fitted_model (sklearn.base.BaseEstimator) – A previously fitted model.
dataset (pandas.DataFrame) – The dataset to be used by the estimator.
fit_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the fit of the classifier.
score_targets (pandas.DataFrame) – The dataset that will be used as targets (labels) to perform the scoring.
- Output:
fitted_model (sklearn.base.BaseEstimator) – The model that results from the fit of the estimator.
predictions (pandas.DataFrame) – The predictions that result from the predict.
score_value (float) – The score value that results from the scoring.
- Parameters:
node_id (str) – Id of the node.
execute ([fit, predict, score]) – List of strings to specify the methods to execute. The allowed strings are those from the _method attribute.
penalty (str, default='l2') – Penalty.
loss (str, default='squared_hinge',) – Loss.
dual (bool, default='True',) – Dual.
tol (float, default='0.0001',) – Tol.
C (float, default='1.0',) –
multi_class (str, default='ovr',) – Multi_class.
fit_intercept (bool, default='True',) – Fit_intercept.
intercept_scaling (int, default='1',) – Intercept_scaling.
class_weight (float, default='None',) – Class_weight.
verbose (int, default='0',) – Verbose.
random_state (str, default='None',) – Random_state.
max_iter (int, default='1000',) – Max_iter.
- dataset = None#
- fit_targets = None#
- fitted_model = None#
- predictions = None#
- score_targets = None#
- score_value = None#
Module contents#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.