File size: 7,639 Bytes
ab779ac
 
 
746a2a0
 
 
71cdced
8c3aa40
 
 
 
746a2a0
 
 
d1748f6
 
 
39239a0
 
 
fac05af
39239a0
 
 
 
fac05af
39239a0
d1748f6
746a2a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
39239a0
 
746a2a0
39239a0
 
746a2a0
 
 
 
 
 
 
 
 
fa68c66
746a2a0
fa68c66
746a2a0
fac05af
 
746a2a0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
---
license: cdla-permissive-2.0
---

# Model Card for TTM

TTM, also known as TinyTimeMixer, are compact pre-trained models for Time-Series Forecasting, open-sourced by IBM Research.
**With less than 1 Million parameters, TTM introduces the notion of the first-ever “tiny” pre-trained models for Time-Series Forecasting.** 

TTM outperforms several popular benchmarks demanding billions of parameters in zero-shot and few-shot forecasting. TTM is pre-trained on diverse public time-series datasets which 
can be easily fine-tuned for your target data. Refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf) for more details.

**Note that zeroshot, fine-tuning and inference tasks using TTM can easily be executed in 1 GPU machine or in laptops too!!**


## Benchmark Highlights:

- TTM outperforms pre-trained GPT4TS (NeurIPS 23) by 7-12% in few-shot forecasting.
- TTM outperforms pre-trained LLMTime (NeurIPS 23) by 24% in zero-shot forecasting.
- TTM outperforms pre-trained SimMTM (NeurIPS 23) by 17% in few-shot forecasting.
- Zero-shot results of TTM often surpass the few-shot results of many SOTA approaches including
  PatchTST (ICLR 23), PatchTSMixer (KDD 23), TimesNet (ICLR 23), DLinear (AAAI 23) and FEDFormer (ICML 22).
- TTM (1024-96, released in this model card) also outperforms pre-trained MOIRAI on FL = 96 by ...
- TTM quick fine-tuning also outperforms the hard statistical baselines (Statistical ensemble and S-Naive) in
  M4-hourly dataset which pretrained TS models are finding hard to outperform.
- TTM takes only a few seconds for zeroshot/inference and a few minutes for finetuning in 1 GPU machine.
  

## Model Description

TTM falls under the category of “focused pre-trained models”, wherein each pre-trained TTM is tailored for a particular forecasting 
setting (governed by the context length and forecast length). Instead of building one massive model supporting all forecasting settings, 
we opt for the approach of constructing smaller pre-trained models, each focusing on a specific forecasting setting, thereby 
yielding more accurate results. Furthermore, this approach ensures that our models remain extremely small and exceptionally fast, 
facilitating easy deployment without demanding a ton of resources. 

Hence, in this model card, we plan to release several pre-trained 
TTMs that can cater to many common forecasting settings in practice. Additionally, we have released our source code along with 
our pretraining scripts that users can utilize to pretrain models on their own. Pretraining TTMs is very easy and fast, taking 
only 3-6 hours using 6 A100 GPUs, as opposed to several days or weeks in traditional approaches.

## Model Releases (along with the branch name where the models are stored):

- 512-96: Given the last 512 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
  in future. Recommended for hourly and minutely forecasts (Ex. resolutions 5 min, 10 min, 15 min, 1 hour, etc)  (branch name: main) 

- 1024-96: Given the last 1024 time-points (i.e. context length), this model can forecast up to next 96 time-points (i.e. forecast length)
  in future. Recommended for hourly and minutely forecasts (Ex. resolutions 5 min, 10 min, 15 min, 1 hour, etc) (branch name: 1024-96-v1) 

- Stay tuned for more models !

## Model Details

For more details on TTM architecture and benchmarks, refer to our [paper](https://arxiv.org/pdf/2401.03955.pdf).

TTM-1 currently supports 2 modes:

 - **Zeroshot forecasting**: Directly apply the pre-trained model on your target data to get an initial forecast (with no training).

 - **Finetuned forecasting**: Finetune the pre-trained model with a subset of your target data to further improve the forecast.

**Since, TTM models are extremely small and fast, it is practically very easy to finetune the model with your available target data in few minutes 
to get more accurate forecasts.**

The current release supports multivariate forecasting via both channel independence and channel-mixing approaches. 
Decoder Channel-Mixing can be enabled during fine-tuning for capturing strong channel-correlation patterns across 
time-series variates, critical capability lacking in existing counterparts.

In addition, TTM also supports exogenous infusion and categorical data which is not released as part of this version. 
Stay tuned for these extended features.

## Recommended Use
1. Users have to standard scale their data before feeding it to the model (Refer to TSP, our data processing utility for data scaling.)
2. Enabling any upsampling or prepending zeros to virtually increase the context length is not recommended and will
   impact the model performance.


### Model Sources [optional]

<!-- Provide the basic links for the model. -->

- **Repository:** [More Information Needed]
- **Paper [optional]:** [More Information Needed]


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

### Direct Use

<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->

[More Information Needed]

### Downstream Use [optional]

<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->

[More Information Needed]

## How to Get Started with the Model

[Point notebooks]

## Benchmarks

## Training Data

The TTM models were trained on a collection of datasets from the Monash Time Series Forecasting repository. The datasets used include:
 - Australian Electricity Demand: https://zenodo.org/records/4659727 
 - Australian Weather: https://zenodo.org/records/4654822 
 - Bitcoin dataset: https://zenodo.org/records/5122101 
 - KDD Cup 2018 dataset: https://zenodo.org/records/4656756 
 - London Smart Meters: https://zenodo.org/records/4656091 
 - Saugeen River Flow: https://zenodo.org/records/4656058
 - Solar Power: https://zenodo.org/records/4656027 
 - Sunspots: https://zenodo.org/records/4654722
 - Solar: https://zenodo.org/records/4656144 
 - US Births: https://zenodo.org/records/4656049 
 - Wind Farms Production data: https://zenodo.org/records/4654858 
 - Wind Power: https://zenodo.org/records/4656032


## Citation [optional]
Kindly cite the following paper, if you intend to use our model or its associated architectures/approaches in your 
work

**BibTeX:**

@article{ekambaram2024ttms,
  title={TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series},
  author={Ekambaram, Vijay and Jati, Arindam and Nguyen, Nam H and Dayama, Pankaj and Reddy, Chandra and Gifford, Wesley M and Kalagnanam, Jayant},
  journal={arXiv preprint arXiv:2401.03955},
  year={2024}
}

**APA:**

Ekambaram, V., Jati, A., Nguyen, N. H., Dayama, P., Reddy, C., Gifford, W. M., & Kalagnanam, J. (2024). TTMs: Fast Multi-level Tiny Time Mixers for Improved Zero-shot and Few-shot Forecasting of Multivariate Time Series. arXiv preprint arXiv:2401.03955.


## Model Card Authors [optional]

[More Information Needed]

## Model Card Contact

[More Information Needed]

## IBM Public Repository Disclosure: 

All content in this repository including code has been provided by IBM under the associated 
open source software license and IBM is under no obligation to provide enhancements, 
updates, or support. IBM developers produced this code as an 
open source project (not as an IBM product), and IBM makes no assertions as to 
the level of quality nor security, and will not be maintaining this code going forward.