Missingness Patterns

ImputeGAP introduces a new taxonomy of missingness patterns tailored to time series, going beyond the traditional MAR and MNAR categories, which were not designed for temporal data.

Setup

Note

  • (N, M) : number of timestamps, number of series

  • W : user-defined offset window in the beginning of the series (default = 25)

  • R : user-defined % of missing values (default = 20%)

  • S : user-defined % of contaminated series (default = 20%)


MONO-BLOCK

One missing block per series


Aligned

Missing blocks start at the same selected positions and have the same fixed size across the chosen series, resulting in aligned missing intervals.

Note

  • R [1%, (100-W)%]

  • The size of a single missing block varies between 1% and (100 - W)% of N.

  • The starting position is the same and begins at W and progresses until the size of the missing block is reached, affecting the first series from the top up to S% of the dataset.

  • GenGap.aligned(ts.data, rate_dataset=1, rate_series=0.4, offset=25)


ImputeGAP Aligned Pattern

Disjoint

Each missing block begins where the previous one ends, so the missing intervals are consecutive and do not overlap.

Note

  • R [1%, (100-W)%]

  • The size of a single missing block varies between 1% and (100 - W)% of N.

  • The starting position of the first missing block begins at W.

  • GenGap.disjoint(ts.data, rate_series=0.4, offset=25)


ImputeGAP Disjoint Pattern

Overlap

Each missing block overlaps with the previous one.

Note

  • R [1%, (100-W)%]

  • The size of a single missing block varies between 1% and (100 - W)% of N.

  • The starting position of the first missing block begins at W.

  • The overlap is controlled by the variable shift.

  • This pattern continues until the limit or N is reached.

  • GenGap.overlap(ts.data, rate_series=0.4, offset=25, shift=0.1)


ImputeGAP Overlap Pattern

Scattered

The starting position of the missing block is chosen at random, all missing blocks share the same size.

Note

  • R [1%, (100-W)%]

  • The size of a single missing block varies between 1% and (100 - W)% of N.

  • The starting position is random, then progresses until the size of the missing block is reached, affecting the first series from the top up to S% of the dataset.

  • GenGap.scattered(ts.data, rate_dataset=1, rate_series=0.4, offset=25)


ImputeGAP Scatter Pattern

MULTI-BLOCK

Multiple missing blocks per series


MCAR

Missing blocks have the same size and are introduced completely at random. The affected time series are selected at random.

Note

  • R [1%, (100-W)%]

  • Data blocks of the same size are removed from arbitrary series at a random position between W and N, until the total number of missing values per series is reached.

  • GenGap.mcar(ts.data, rate_dataset=1, rate_series=0.2, offset=25, seed=False, block_size=20)


ImputeGAP Aligned Pattern

Block Distribution

Missing data follows a probability distribution, each position has a certain chance of being missing.

Note

  • R [1%, (100-W)%]

  • Data is removed following a distribution given by the user for every values of the series, affecting the first series from the top up to S% of the dataset.

  • GenGap.gaussian(ts.data, rate_dataset=1, rate_series=0.4, offset=25, selected_mean="position", std_dev=0.2)

To configure the block distribution pattern, please refer to this page.


ImputeGAP Distribution Pattern