Missingness Patterns¶

ImputeGAP introduces a new taxonomy of missingness patterns tailored to time series, going beyond the traditional MAR and MNAR categories, which were not designed for temporal data.

Setup¶

Note

(N, M) : number of timestamps, number of series
W : user-defined offset window in the beginning of the series (default = 25)
R : user-defined % of missing values (default = 20%)
S : user-defined % of contaminated series (default = 20%)

MONO-BLOCK¶

One missing block per series

Aligned

Missing blocks start at the same selected positions and have the same fixed size across the chosen series, resulting in aligned missing intervals.

Note

R ∈ [1%, (100-W)%]
The size of a single missing block varies between 1% and (100 - W)% of N.
The starting position is the same and begins at W and progresses until the size of the missing block is reached, affecting the first series from the top up to S% of the dataset.
GenGap.aligned(ts.data, rate_dataset=1, rate_series=0.4, offset=25)

Disjoint

Each missing block begins where the previous one ends, so the missing intervals are consecutive and do not overlap.

Note

R ∈ [1%, (100-W)%]
The size of a single missing block varies between 1% and (100 - W)% of N.
The starting position of the first missing block begins at W.
GenGap.disjoint(ts.data, rate_series=0.4, offset=25)

Overlap

Each missing block overlaps with the previous one.

Note

R ∈ [1%, (100-W)%]
The size of a single missing block varies between 1% and (100 - W)% of N.
The starting position of the first missing block begins at W.
The overlap is controlled by the variable shift.
This pattern continues until the limit or N is reached.
GenGap.overlap(ts.data, rate_series=0.4, offset=25, shift=0.1)

Scattered

The starting position of the missing block is chosen at random, all missing blocks share the same size.

Note

R ∈ [1%, (100-W)%]
The size of a single missing block varies between 1% and (100 - W)% of N.
The starting position is random, then progresses until the size of the missing block is reached, affecting the first series from the top up to S% of the dataset.
GenGap.scattered(ts.data, rate_dataset=1, rate_series=0.4, offset=25)

MULTI-BLOCK¶

Multiple missing blocks per series

MCAR

Missing blocks have the same size and are introduced completely at random. The affected time series are selected at random.

Note

R ∈ [1%, (100-W)%]
Data blocks of the same size are removed from arbitrary series at a random position between W and N, until the total number of missing values per series is reached.
GenGap.mcar(ts.data, rate_dataset=1, rate_series=0.2, offset=25, seed=False, block_size=20)

Block Distribution

Missing data follows a probability distribution, each position has a certain chance of being missing.

Note

R ∈ [1%, (100-W)%]
Data is removed following a distribution given by the user for every values of the series, affecting the first series from the top up to S% of the dataset.
GenGap.gaussian(ts.data, rate_dataset=1, rate_series=0.4, offset=25, selected_mean="position", std_dev=0.2)

To configure the block distribution pattern, please refer to this page.