tdooms
/

ts-medium

Model card Files Files and versions Community

TS Medium

This is the medium version of the bilinear transformers trained on TinyStories. The primary purpose of this model is interpretability, most design choices were made with that in mind.

The code to run this custom model can be found here, along with many utility functions for weight-based interpretability.

Model Details

30 million parameters
6 layers
8 attention heads
model dimension 512
bilinear MLP with expansion factor 4
context length of 256
trained for 1 epoch (~2.5B tokens)
rotary positional embedding
custom tinystories tokenizer

Downloads last month: 120

Safetensors

Model size

29.4M params

Tensor type

F32

·

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including tdooms/ts-medium

Bilinear Transformers (TinyStories)

A small collection of Transformers with bilinear MLPs, trained on the TinyStories dataset. • 3 items • Updated Oct 15, 2024