Toto-Open-Base-1.0

Toto (Time Series Optimized Transformer for Observability is a time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios.

Overview of Toto-Open-Base-1.0 architecture.

⚡ Quick Start: Model Inference

Inference code is available on GitHub.

Installation

# Clone the repository
git clone https://github.com/DataDog/toto.git
cd toto

# Install dependencies
pip install -r requirements.txt

🚀 Inference Example

Here's how to quickly generate forecasts using Toto:

import torch
from data.util.dataset import MaskedTimeseries
from inference.forecaster import TotoForecaster
from model.toto import Toto

DEVICE = 'cuda'

# Load pre-trained Toto model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE)

# Optional: compile model for enhanced speed
toto.compile()

forecaster = TotoForecaster(toto.model)

# Example input series (7 variables, 4096 timesteps)
input_series = torch.randn(7, 4096).to(DEVICE)
timestamp_seconds = torch.zeros(7, 4096).to(DEVICE)
time_interval_seconds = torch.full((7,), 60*15).to(DEVICE)

inputs = MaskedTimeseries(
    series=input_series,
    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
    id_mask=torch.zeros_like(input_series),
    timestamp_seconds=timestamp_seconds,
    time_interval_seconds=time_interval_seconds,
)

# Generate forecasts for next 336 timesteps
forecast = forecaster.forecast(
    inputs,
    prediction_length=336,
    num_samples=256,
    samples_per_batch=256,
)

# Access results
mean_prediction = forecast.mean
prediction_samples = forecast.samples
lower_quantile = forecast.quantile(0.1)
upper_quantile = forecast.quantile(0.9)

For detailed inference instructions, refer to the inference tutorial notebook.

Performance Recommendations

For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set use_memory_efficient to True.

💾 Available Checkpoints

Checkpoint	Parameters	Config	Size	Notes
Toto-Open-Base-1.0	151M	Config	605 MB	Initial release with SOTA performance

✨ Key Features

Zero-Shot Forecasting
Multi-Variate Support
Decoder-Only Transformer Architecture
Probabilistic Predictions (Student-T mixture model)
Causal Patch-Wise Instance Normalization
Extensive Pretraining on Large-Scale Data
High-Dimensional Time Series Support
Tailored for Observability Metrics
State-of-the-Art Performance on GiftEval and BOOM

📚 Training Data Summary

Observability Metrics: ~1 trillion points from Datadog internal systems (no customer data)
Public Datasets:
- GiftEval Pretrain
- Chronos datasets
Synthetic Data: ~1/3 of training data

🔗 Additional Resources

Research Paper (To add)
GitHub Repository
Blog Post
BOOM Dataset

📖 Citation

If you use Toto in your research or applications, please cite us using the following:

@misc{toto2025,
  title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
  author={TODO},
  year={2025},
  eprint={arXiv:TODO},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Datadog
/

Toto-Open-Base-1.0

Toto-Open-Base-1.0

⚡ Quick Start: Model Inference

Installation

🚀 Inference Example

Performance Recommendations

For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set `use_memory_efficient` to `True`.

💾 Available Checkpoints

✨ Key Features

📚 Training Data Summary

🔗 Additional Resources

📖 Citation

Datasets used to train Datadog/Toto-Open-Base-1.0

Toto-Open-Base-1.0

⚡ Quick Start: Model Inference

Installation

🚀 Inference Example

Performance Recommendations

For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set use_memory_efficient to True.

💾 Available Checkpoints

✨ Key Features

📚 Training Data Summary

🔗 Additional Resources

📖 Citation

Datasets used to train Datadog/Toto-Open-Base-1.0

For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set `use_memory_efficient` to `True`.