rain.nodes.pysad package#

Submodules#

rain.nodes.pysad.node_structure module#

Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.pysad.node_structure.PySadNode(node_id: str)[source]#

Bases: ComputationalNode

Node that perform some operations using the PySad library without input/output constraints.

Parameters:

node_id (str) – Unique identifier of the node in the DataFlow.

abstract execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

class rain.nodes.pysad.node_structure.PySadPredictor(node_id: str)[source]#

Bases: PySadNode

Class representing a PySad predictor, use the given model and dataset to obtain the predictions.

Input:
  • dataset (pd.DataFrame) – A Pandas DataFrame.

  • model (pickle) – A model in pickle format.

Output:

predictions (pd.DataFrame) – The DataFrame containing the predictions.

Parameters:

node_id (str) – Id of the node.

dataset = None#
abstract execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

model = None#
predictions = None#
class rain.nodes.pysad.node_structure.PySadTrainer(node_id: str)[source]#

Bases: PySadNode

Class representing a PySad Trainer, it trains a model given a Dataset.

Input:
  • dataset (pd.DataFrame) – A Pandas DataFrame.

  • labels (pd.Series) – A Pandas Series containing the labels.

Output:
  • model (pickle) – The trained model in pickle format.

  • auroc (float) – The AUROC associated to the trained model.

Parameters:

node_id (str) – Id of the node.

auroc = None#
dataset = None#
abstract execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

labels = None#
model = None#
class rain.nodes.pysad.node_structure.PySadTransformer(node_id: str)[source]#

Bases: PySadNode

Class representing a PySad Transformer, it manipulates a given dataset and returns a modified version of it.

Input:

dataset (pd.DataFrame) – A Pandas DataFrame.

Output:

dataset (pd.DataFrame) – A Pandas DataFrame.

Parameters:

node_id (str) – Id of the node.

dataset = None#
abstract execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

rain.nodes.pysad.trainer module#

Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.pysad.trainer.HalfSpaceTree(node_id: str, data, window_size: int = 100, num_trees: int = 25, initial_window_x: Optional[ndarray] = None, max_depth: int = 15)[source]#

Bases: PySadTrainer

Node that trains a model using the HalfSpaceTree algorithm.

Input:
  • dataset (pd.DataFrame) – A Pandas DataFrame containing the features.

  • labels (pd.Series) – A Pandas Series containing the labels.

Output:
  • model (pickle) – The trained model in pickle format.

  • auroc (float) – The AUROC metric of the trained model.

Parameters:
  • node_id (str) – Id of the node.

  • window_size (int, default=100) – The size of the window.

  • num_trees (int, default=25) – The number of trees.

  • initial_window_x (np.ndarray, default=None) – The initial window to fit for initial calibration period. If not None, we simply apply fit to these instances.

  • max_depth (int, default=15) – The maximum depth of the trees.

auroc = None#
dataset = None#
execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

labels = None#
model = None#
class rain.nodes.pysad.trainer.IForestASD(node_id: str, window_size: int = 2048)[source]#

Bases: PySadTrainer

Node that trains a model using the IForestASD algorithm.

Input:
  • dataset (pd.DataFrame) – A Pandas DataFrame containing the features.

  • labels (pd.Series) – A Pandas Series containing the labels.

Output:
  • model (pickle) – The trained model in pickle format.

  • auroc (float) – The AUROC metric of the trained model.

Parameters:
  • node_id (str) – Id of the node.

  • window_size (int, default= 2048) – The size of the reference window and its sliding.

auroc = None#
dataset = None#
execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

labels = None#
model = None#
class rain.nodes.pysad.trainer.XStream(node_id: str, window_size: int = 25, num_components: int = 100, n_chains: int = 100, depth: int = 25)[source]#

Bases: PySadTrainer

Node that trains a model using the xStream algorithm.

Input:
  • dataset (pd.DataFrame) – A Pandas DataFrame containing the features.

  • labels (pd.Series) – A Pandas Series containing the labels.

Output:
  • model (pickle) – The trained model in pickle format.

  • auroc (float) – The AUROC metric of the trained model.

Parameters:
  • node_id (str) – Id of the node.

  • window_size (int, default=25) – The size (and the sliding length) of the reference window.

  • num_components (int, default=100) – The number of components for streamhash projection.

  • n_chains (int, default=100) – The number of half-space chains.

  • depth (int, default=25) – The maximum depth for the chains.

auroc = None#
dataset = None#
execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

labels = None#
model = None#

rain.nodes.pysad.transformer module#

Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

class rain.nodes.pysad.transformer.ConformalProbabilityCalibrator(node_id: str, windowed: bool = True, window_size: int = 300)[source]#

Bases: PySadNode

This class provides an interface to convert the scores into probabilities through conformal prediction.

Input:

scores (pd.DataFrame) – A Pandas DataFrame containing the scores.

Output:

scores (pd.DataFrame) – A Pandas DataFrame containing the scores.

Parameters:
  • node_id (str) – Id of the node.

  • windowed (bool, default=True) – Whether the probability calibrator is windowed so that forget scores that are older than window_size.

  • window_size (int, default=300) – The size of window for running average and std.

execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

scores = None#
class rain.nodes.pysad.transformer.GaussianTailProbabilityCalibrator(node_id: str, running_statistics: bool = True, window_size: int = 300)[source]#

Bases: PySadNode

This class provides an interface to convert the scores into probabilities via Q-function, i.e., the tail

function of Gaussian distribution.

Input:

scores (pd.DataFrame) – A Pandas DataFrame containing the scores.

Output:

scores (pd.DataFrame) – A Pandas DataFrame containing the scores.

Parameters:
  • node_id (str) – Id of the node.

  • running_statistics (bool, default=True) – Whether to calculate the mean and variance through running window.

  • window_size (int, default=300) – The size of window for running average and std. Ignored if running_statistics parameter is False.

execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

scores = None#
class rain.nodes.pysad.transformer.InstanceUnitNormScaler(node_id: str, pow: int = 2)[source]#

Bases: PySadTransformer

A scaler that makes the instance feature vector’s norm equal to 1, i.e., the unit vector.

Input:

dataset (pd.DataFrame) – A Pandas DataFrame.

Output:

dataset (pd.DataFrame) – A Pandas DataFrame.

Parameters:
  • node_id (str) – Id of the node.

  • pow (float, default=2) – The power, for which the norm is calculated. pow=2 is equivalent to the euclidean distance.

dataset = None#
execute()[source]#

Expose the main functionality: depending on the node, the computation is done using a specific Python library and its function/s.

Module contents#

Copyright (C) 2023 Università degli Studi di Camerino and Sigma S.p.A. Authors: Alessandro Antinori, Rosario Capparuccia, Riccardo Coltrinari, Flavio Corradini, Marco Piangerelli, Barbara Re, Marco Scarpetta

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.