Safetensors
gpt2
no-shuffle-gpt2 / README.md
juliekallini's picture
Update README.md
5076d97 verified
|
raw
history blame
3.18 kB
metadata
{}

Model Card for NoShuffle GPT-2

This is one model in a collection of models trained on the impossible languages of Kallini et al. 2024.

This model is a GPT-2 Small model trained from scratch on the NoShuffle language.

languages.png

Model Details

Uses

This artefact is solely intended for the study of language learning and acquisition in computational models. It should not be used in any production setting.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import torch

# Load model and tokenizer
model_id = "mission-impossible-lms/no-shuffle-gpt2"
model = GPT2LMHeadModel.from_pretrained(model_id)
tokenizer = GPT2Tokenizer.from_pretrained(model_id)

# Set up the prompt and encode it
prompt = "He clean"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
output = model.generate(inputs.input_ids, max_length=20)

# Decode and print the generated text
generated_text = tokenizer.decode(output[0], skip_special_tokens=True)
print(generated_text)

Training Details

Training Data

This model was trained on the 100M-word BabyLM dataset. Before training, we first transform the dataset into the corresponding impossible language, as described in our paper.

Training Procedure

This model was trained for 3,000 gradient steps with a batch size of 2^19 tokens. We train with a learning rate that linearly warms up from 0 to 6e-4 over 300 steps.

Environmental Impact

  • Hardware Type: NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs.
  • Hours used: ~24 hours.

Citation

BibTeX:

@inproceedings{kallini-etal-2024-mission,
    title = "Mission: Impossible Language Models",
    author = "Kallini, Julie  and
      Papadimitriou, Isabel  and
      Futrell, Richard  and
      Mahowald, Kyle  and
      Potts, Christopher",
    editor = "Ku, Lun-Wei  and
      Martins, Andre  and
      Srikumar, Vivek",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.787",
    doi = "10.18653/v1/2024.acl-long.787",
    pages = "14691--14714",
}

APA:

Model Card Authors

Julie Kallini

Model Card Contact

[email protected]