flexmeasures.data.models.forecasting.pipelines.base

Classes

class flexmeasures.data.models.forecasting.pipelines.base.BasePipeline(sensors: dict[str, int], regressors: list[str], future_regressors: list[str], target: str, n_hours_to_predict: int, max_forecast_horizon: int, forecast_frequency: int, event_starts_after: datetime | None = None, event_ends_before: datetime | None = None, predict_start: datetime | None = None, predict_end: datetime | None = None)

Base class for Train and Predict pipelines.

This class handles loading and preprocessing time series data for training or prediction, including missing value handling and splitting into regressors (X) and target (y).

Parameters: - sensors (dict[str, int]): Dictionary mapping sensor names to sensor IDs. - regressors (list[str]): Names of sensors used as features. - target (str): Name of the target sensor. - n_hours_to_predict: Number of hours to predict into the future. - max_forecast_horizon (int): Max forecasting horizon. - event_starts_after (datetime | None): Earliest event_start to include. - event_ends_before (datetime | None): Latest event_start to include.

__init__(sensors: dict[str, int], regressors: list[str], future_regressors: list[str], target: str, n_hours_to_predict: int, max_forecast_horizon: int, forecast_frequency: int, event_starts_after: datetime | None = None, event_ends_before: datetime | None = None, predict_start: datetime | None = None, predict_end: datetime | None = None) → None

_split_covariates_data(X_past_regressors_df, X_future_regressors_df, target_dataframe, split_timestamp, target_start, target_end, forecast_end) → list[TimeSeries]

Splits past covariates, future covariates, and target data at a given timestamp.

Past covariates include data available before split_timestamp.
Future covariates include forecasted values available before split_timestamp

and extending up to max_forecast_horizon_in_hours. - Target data includes values up to split_timestamp for model training.

Notes:

Past covariates include only known historical values (i.e., belief time is after event time).
Future covariates include forecasts made before split_timestamp and ensure that only

the latest available belief is selected for each future event time.

Example:

Given:

split_timestamp = “2024-01-10 00:00:00”
Forecast horizon: 4 hours
Past covariates: Observed values before split_timestamp
Future covariates: Forecasts made before split_timestamp for the next 4 hours

The function returns:

past_covariates → Values before 2024-01-10 00:00:00
future_covariates → Forecasted values end at 2024-01-10 04:00:00
target_data → Target values up to `2024-01-10 00:00:00

detect_and_fill_missing_values(df: pd.DataFrame, sensor_names: str | list[str], start: datetime, end: datetime, interpolate_kwargs: dict = None, fill: float = 0.0) → TimeSeries

Detects and fills missing values in a time series using the Darts MissingValuesFiller transformer.

This method interpolates missing values in the time series using the pd.DataFrame.interpolate() method.

Parameters: - df (pd.DataFrame): The input dataframe containing time series data with a “time” column. - sensor_name (str): The name of the sensor (used for logging). - start (datetime): The desired start time of the time series. - end (datetime): The desired end time of the time series. - interpolate_kwargs (dict, optional): Additional keyword arguments passed to MissingValuesFiller,

which internally calls pd.DataFrame.interpolate(). For more details, see the Darts documentation.

fill (float): value used to fill gaps in case there is no data at all.

Returns: - TimeSeries: The time series with missing values filled.

Raises: - ValueError: If the input dataframe is empty. - logging.warning: If missing values are detected and filled using pd.DataFrame.interpolate().

load_data_all_beliefs() → DataFrame

This function fetches data for each sensor. If a sensor is listed as a future regressor, it fetches all available beliefs (including forecasts).

Returns: - pd.DataFrame: A DataFrame containing all the data from each sensor.

split_data_all_beliefs(df: DataFrame, is_predict_pipeline: bool = False) → tuple

Splits the input DataFrame into past covariates, future covariates, and target series for each prediction belief_time.

This function ensures that: - Past covariates contain realized (actual) and forecast data, based on the latest available beliefs before the prediction event_start, about events up to the prediction belief_time. - Future covariates consist of:

Forecasted values up to max_forecast_horizon with belief_time under the prediction belief_time.

Realized data (i.e., data with the most recent belief_time) for event_starts that occur before the prediction belief_time.

The target series is extracted for each prediction belief_time.

Returns:

tuple:

past_covariates_list (List[TimeSeries] or None): List of DataFrames, each containing past data up

to the corresponding prediction belief_time. - future_covariates_list (List[TimeSeries] or None): List of DataFrames, each containing future data up to the prediction belief_time. - target_list (List[TimeSeries]): List of Series, each containing the target values up to the respective prediction belief_time.