Using the fastai2 `Datasets` to make an time series dataset.

Dataloader

class TSDataLoader[source]

TSDataLoader(dataset, horizon, lookback=72, step=1, min_seq_len=None, max_std=2, bs=64, shuffle=False, num_workers=None, verbose=False, do_setup=True, pin_memory=False, timeout=0, batch_size=None, drop_last=False, indexed=None, n=None, device=None, wif=None, before_iter=None, after_item=None, before_batch=None, after_batch=None, after_iter=None, create_batches=None, create_item=None, create_batch=None, retain=None, get_idxs=None, sample=None, shuffle_fn=None, do_batch=None) :: TfmdDL

Transformed DataLoader

horizon,lookback = 2,5
ints = L(np.arange(9.)[None,:],np.arange(9.,14)[None,:]).map(tensor)
ints
(#2) [tensor([[0., 1., 2., 3., 4., 5., 6., 7., 8.]]),tensor([[ 9., 10., 11., 12., 13.]])]
dl = TSDataLoader(ints, horizon, lookback, step=2, norm=False)
list(dl)
Need to pad 1/2 time series due to length.
[(TSTensorSeq([[[ 0.,  1.,  2.,  3.,  4.]],
  
          [[ 2.,  3.,  4.,  5.,  6.]],
  
          [[10., 10.,  9., 10., 11.]]]),
  TSTensorSeqy([[[ 0.,  1.,  2.,  3.,  4.,  5.,  6.]],
  
          [[ 2.,  3.,  4.,  5.,  6.,  7.,  8.]],
  
          [[10., 10., 10., 10., 11., 12., 13.]]]))]

The first sequence (0 to 7) is transformed in to two items. One with x from 0 to 4 and y from 0 to 6. The next one is shifted just two, because step == 2. The second sequence (and third resulting item) is not long enough and is therefore padded with the mean of x (10). Note both x and y are padded with the mean of x

Showing

Intergration Example

from fastseq.core import *
from fastai2.basics import *
path = untar_data(URLs.m4_daily)
df_train = pd.read_csv(path/'train.csv',nrows=300)
df_test = pd.read_csv(path/'val.csv')
df_test.head()
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15
0 D1 2039.20 2035.00 2051.80 2061.8 2063.50 2069.5 2054.00 2057.00 2062.80 2066.40 2067.40 2071.40 2083.80 2080.60
1 D2 2986.00 3001.20 2975.90 2996.1 2981.90 2985.5 2975.80 2956.20 2964.70 2989.00 2991.40 3024.90 3070.80 3076.90
2 D3 1120.70 1117.90 1115.10 1112.3 1109.50 1106.7 1103.90 1101.10 1098.30 1095.50 1092.70 1089.90 1087.10 1084.30
3 D4 1190.00 1162.00 1134.00 1106.0 1078.00 1050.0 1022.00 994.00 966.00 938.00 910.00 1428.00 1400.00 1372.00
4 D5 5904.67 5917.05 5922.58 5928.8 5935.29 6002.8 6009.47 6014.82 6020.19 6072.49 6077.72 6080.23 6082.75 6108.07
horizon = 14
lookback = 14*3

items = ts_lists(df_train.iloc[:,1:].values)
splits = RandomSplitter()(items)

dl = TSDataLoader(items, horizon = horizon, lookback = lookback, step=5)
dl.show_batch()