rain.nodes.pysad package#
Submodules#
rain.nodes.pysad.node_structure module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.pysad.node_structure.PySadNode(node_id: str)[source]#
Bases:
ComputationalNodeNode that perform some operations using the PySad library without input/output constraints.
- Parameters:
node_id (str) – Unique identifier of the node in the DataFlow.
- class rain.nodes.pysad.node_structure.PySadPredictor(node_id: str)[source]#
Bases:
PySadNodeClass representing a PySad predictor, use the given model and dataset to obtain the predictions.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame.
model (pickle) – A model in pickle format.
- Output:
predictions (pd.DataFrame) – The DataFrame containing the predictions.
- Parameters:
node_id (str) – Id of the node.
- dataset = None#
- abstract execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- model = None#
- predictions = None#
- class rain.nodes.pysad.node_structure.PySadTrainer(node_id: str)[source]#
Bases:
PySadNodeClass representing a PySad Trainer, it trains a model given a Dataset.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame.
labels (pd.Series) – A Pandas Series containing the labels.
- Output:
model (pickle) – The trained model in pickle format.
auroc (float) – The AUROC associated to the trained model.
- Parameters:
node_id (str) – Id of the node.
- auroc = None#
- dataset = None#
- abstract execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- labels = None#
- model = None#
- class rain.nodes.pysad.node_structure.PySadTransformer(node_id: str)[source]#
Bases:
PySadNodeClass representing a PySad Transformer, it manipulates a given dataset and returns a modified version of it.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame.
- Output:
dataset (pd.DataFrame) – A Pandas DataFrame.
- Parameters:
node_id (str) – Id of the node.
- dataset = None#
rain.nodes.pysad.trainer module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.pysad.trainer.HalfSpaceTree(node_id: str, data, window_size: int = 100, num_trees: int = 25, initial_window_x: Optional[ndarray] = None, max_depth: int = 15)[source]#
Bases:
PySadTrainerNode that trains a model using the HalfSpaceTree algorithm.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame containing the features.
labels (pd.Series) – A Pandas Series containing the labels.
- Output:
model (pickle) – The trained model in pickle format.
auroc (float) – The AUROC metric of the trained model.
- Parameters:
node_id (str) – Id of the node.
window_size (int, default=100) – The size of the window.
num_trees (int, default=25) – The number of trees.
initial_window_x (np.ndarray, default=None) – The initial window to fit for initial calibration period. If not None, we simply apply fit to these instances.
max_depth (int, default=15) – The maximum depth of the trees.
- auroc = None#
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- labels = None#
- model = None#
- class rain.nodes.pysad.trainer.IForestASD(node_id: str, window_size: int = 2048)[source]#
Bases:
PySadTrainerNode that trains a model using the IForestASD algorithm.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame containing the features.
labels (pd.Series) – A Pandas Series containing the labels.
- Output:
model (pickle) – The trained model in pickle format.
auroc (float) – The AUROC metric of the trained model.
- Parameters:
node_id (str) – Id of the node.
window_size (int, default= 2048) – The size of the reference window and its sliding.
- auroc = None#
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- labels = None#
- model = None#
- class rain.nodes.pysad.trainer.XStream(node_id: str, window_size: int = 25, num_components: int = 100, n_chains: int = 100, depth: int = 25)[source]#
Bases:
PySadTrainerNode that trains a model using the xStream algorithm.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame containing the features.
labels (pd.Series) – A Pandas Series containing the labels.
- Output:
model (pickle) – The trained model in pickle format.
auroc (float) – The AUROC metric of the trained model.
- Parameters:
node_id (str) – Id of the node.
window_size (int, default=25) – The size (and the sliding length) of the reference window.
num_components (int, default=100) – The number of components for streamhash projection.
n_chains (int, default=100) – The number of half-space chains.
depth (int, default=25) – The maximum depth for the chains.
- auroc = None#
- dataset = None#
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- labels = None#
- model = None#
rain.nodes.pysad.transformer module#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.
- class rain.nodes.pysad.transformer.ConformalProbabilityCalibrator(node_id: str, windowed: bool = True, window_size: int = 300)[source]#
Bases:
PySadNodeThis class provides an interface to convert the scores into probabilities through conformal prediction.
- Input:
scores (pd.DataFrame) – A Pandas DataFrame containing the scores.
- Output:
scores (pd.DataFrame) – A Pandas DataFrame containing the scores.
- Parameters:
node_id (str) – Id of the node.
windowed (bool, default=True) – Whether the probability calibrator is windowed so that forget scores that are older than window_size.
window_size (int, default=300) – The size of window for running average and std.
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- scores = None#
- class rain.nodes.pysad.transformer.GaussianTailProbabilityCalibrator(node_id: str, running_statistics: bool = True, window_size: int = 300)[source]#
Bases:
PySadNode- This class provides an interface to convert the scores into probabilities via Q-function, i.e., the tail
function of Gaussian distribution.
- Input:
scores (pd.DataFrame) – A Pandas DataFrame containing the scores.
- Output:
scores (pd.DataFrame) – A Pandas DataFrame containing the scores.
- Parameters:
node_id (str) – Id of the node.
running_statistics (bool, default=True) – Whether to calculate the mean and variance through running window.
window_size (int, default=300) – The size of window for running average and std. Ignored if running_statistics parameter is False.
- execute()[source]#
Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.
- scores = None#
- class rain.nodes.pysad.transformer.InstanceUnitNormScaler(node_id: str, pow: int = 2)[source]#
Bases:
PySadTransformerA scaler that makes the instance feature vector’s norm equal to 1, i.e., the unit vector.
- Input:
dataset (pd.DataFrame) – A Pandas DataFrame.
- Output:
dataset (pd.DataFrame) – A Pandas DataFrame.
- Parameters:
node_id (str) – Id of the node.
pow (float, default=2) – The power, for which the norm is calculated. pow=2 is equivalent to the euclidean distance.
- dataset = None#
Module contents#
Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta
This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.