---
# For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1
# Doc / guide: https://huggingface.co/docs/hub/model-cards
{}
---

# Model Card for NoShuffle GPT-2

<!-- Provide a quick summary of what the model is/does. -->

This is one model in a collection of models trained on the impossible
languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416).

This model is a GPT-2 Small model trained from scratch on the *NoShuffle*
language.

![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png)

## Model Details

- **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts
- **Model type:** Causal Language Model
- **Language(s) (NLP):** English
- **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models
- **Paper:** https://arxiv.org/pdf/2401.06416

## Uses

This artefact is solely intended for the study of language learning
and acquisition in computational models. It should not be
used in any production setting.

## How to Get Started with the Model

Use the code below to get started with the model.

```
from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load model and tokenizer
model_id = "mission-impossible-lms/no-shuffle-gpt2"
model = GPT2LMHeadModel.from_pretrained(model_id)
tokenizer = GPT2Tokenizer.from_pretrained(model_id)

# Set up the prompt and encode it
prompt = "He clean"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
output = model.generate(inputs.input_ids, max_length=20)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)
```

## Training Details

### Training Data

This model was trained on the 100M-word BabyLM dataset.
Before training, we first transform the dataset into
the corresponding impossible language, as described in
our paper.

### Training Procedure

This model was trained for 3,000 gradient steps with
a batch size of 2^19 tokens. We train with a learning
rate that linearly warms up from 0 to 6e-4 over 300 steps.

## Environmental Impact

- **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
- **Hours used:** ~24 hours.

## Citation 
**BibTeX:**

```bibtex
@inproceedings{kallini-etal-2024-mission,
    title = "Mission: Impossible Language Models",
    author = "Kallini, Julie  and
      Papadimitriou, Isabel  and
      Futrell, Richard  and
      Mahowald, Kyle  and
      Potts, Christopher",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.787",
    doi = "10.18653/v1/2024.acl-long.787",
    pages = "14691--14714",
}
```

**APA:**

## Model Card Authors

Julie Kallini

## Model Card Contact

kallini@stanford.edu