Toto-Open-Base-1.0

Toto (Time Series Optimized Transformer for Observability is a time-series foundation model designed for multi-variate time series forecasting, emphasizing observability metrics. Toto efficiently handles high-dimensional, sparse, and non-stationary data commonly encountered in observability scenarios.

model architecture Overview of Toto-Open-Base-1.0 architecture.

⚑ Quick Start: Model Inference

Inference code is available on GitHub.

Installation

# Clone the repository
git clone https://github.com/DataDog/toto.git
cd toto

# Install dependencies
pip install -r requirements.txt

πŸš€ Inference Example

Here's how to quickly generate forecasts using Toto:

import torch
from data.util.dataset import MaskedTimeseries
from inference.forecaster import TotoForecaster
from model.toto import Toto

DEVICE = 'cuda'

# Load pre-trained Toto model
toto = Toto.from_pretrained('Datadog/Toto-Open-Base-1.0').to(DEVICE)

# Optional: compile model for enhanced speed
toto.compile()

forecaster = TotoForecaster(toto.model)

# Example input series (7 variables, 4096 timesteps)
input_series = torch.randn(7, 4096).to(DEVICE)
timestamp_seconds = torch.zeros(7, 4096).to(DEVICE)
time_interval_seconds = torch.full((7,), 60*15).to(DEVICE)

inputs = MaskedTimeseries(
    series=input_series,
    padding_mask=torch.full_like(input_series, True, dtype=torch.bool),
    id_mask=torch.zeros_like(input_series),
    timestamp_seconds=timestamp_seconds,
    time_interval_seconds=time_interval_seconds,
)

# Generate forecasts for next 336 timesteps
forecast = forecaster.forecast(
    inputs,
    prediction_length=336,
    num_samples=256,
    samples_per_batch=256,
)

# Access results
mean_prediction = forecast.mean
prediction_samples = forecast.samples
lower_quantile = forecast.quantile(0.1)
upper_quantile = forecast.quantile(0.9)

For detailed inference instructions, refer to the inference tutorial notebook.

Performance Recommendations

  • For optimal speed and reduced memory usage, install xFormers and flash-attention. Then, set use_memory_efficient to True.


πŸ’Ύ Available Checkpoints

Checkpoint Parameters Config Size Notes
Toto-Open-Base-1.0 151M Config 605 MB Initial release with SOTA performance

✨ Key Features

  • Zero-Shot Forecasting
  • Multi-Variate Support
  • Decoder-Only Transformer Architecture
  • Probabilistic Predictions (Student-T mixture model)
  • Causal Patch-Wise Instance Normalization
  • Extensive Pretraining on Large-Scale Data
  • High-Dimensional Time Series Support
  • Tailored for Observability Metrics
  • State-of-the-Art Performance on GiftEval and BOOM

πŸ“š Training Data Summary

  • Observability Metrics: ~1 trillion points from Datadog internal systems (no customer data)
  • Public Datasets:
  • Synthetic Data: ~1/3 of training data

πŸ”— Additional Resources


πŸ“– Citation

If you use Toto in your research or applications, please cite us using the following:

@misc{toto2025,
  title={This Time is Different: An Observability Perspective on Time Series Foundation Models},
  author={TODO},
  year={2025},
  eprint={arXiv:TODO},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}
Downloads last month
206
Safetensors
Model size
151M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train Datadog/Toto-Open-Base-1.0