--- # For reference on model card metadata, see the spec: https://github.com/huggingface/hub-docs/blob/main/modelcard.md?plain=1 # Doc / guide: https://huggingface.co/docs/hub/model-cards {} --- # Model Card for NoShuffle GPT-2 This is one model in a collection of models trained on the impossible languages of [Kallini et al. 2024](https://arxiv.org/abs/2401.06416). This model is a GPT-2 Small model trained from scratch on the *NoShuffle* language. ![languages.png](https://cdn-uploads.huggingface.co/production/uploads/6268bc06adb1c6525b3d5157/pBt38YYQL1gj8DqjyorWS.png) ## Model Details - **Developed by:** Julie Kallini, Isabel Papadimitriou, Richard Futrell, Kyle Mahowald, Christopher Potts - **Model type:** Causal Language Model - **Language(s) (NLP):** English - **GitHub Repository:** https://github.com/jkallini/mission-impossible-language-models - **Paper:** https://arxiv.org/pdf/2401.06416 ## Uses This artefact is solely intended for the study of language learning and acquisition in computational models. It should not be used in any production setting. ## How to Get Started with the Model Use the code below to get started with the model. ``` from transformers import GPT2LMHeadModel, GPT2Tokenizer import torch # Load model and tokenizer model_id = "mission-impossible-lms/no-shuffle-gpt2" model = GPT2LMHeadModel.from_pretrained(model_id) tokenizer = GPT2Tokenizer.from_pretrained(model_id) # Set up the prompt and encode it prompt = "He clean" inputs = tokenizer(prompt, return_tensors="pt") # Generate text output = model.generate(inputs.input_ids, max_length=20) # Decode and print the generated text generated_text = tokenizer.decode(output[0], skip_special_tokens=True) print(generated_text) ``` ## Training Details ### Training Data This model was trained on the 100M-word BabyLM dataset. Before training, we first transform the dataset into the corresponding impossible language, as described in our paper. ### Training Procedure This model was trained for 3,000 gradient steps with a batch size of 2^19 tokens. We train with a learning rate that linearly warms up from 0 to 6e-4 over 300 steps. ## Environmental Impact - **Hardware Type:** NVIDIA RTX 3090 (24GB) + NVIDIA RTX A6000 (48GB) GPUs. - **Hours used:** ~24 hours. ## Citation **BibTeX:** ```bibtex @inproceedings{kallini-etal-2024-mission, title = "Mission: Impossible Language Models", author = "Kallini, Julie and Papadimitriou, Isabel and Futrell, Richard and Mahowald, Kyle and Potts, Christopher", editor = "Ku, Lun-Wei and Martins, Andre and Srikumar, Vivek", booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)", month = aug, year = "2024", address = "Bangkok, Thailand", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2024.acl-long.787", doi = "10.18653/v1/2024.acl-long.787", pages = "14691--14714", } ``` **APA:** ## Model Card Authors Julie Kallini ## Model Card Contact kallini@stanford.edu