Razvanip/nanoGPT2-124m · Hugging Face

The checkpoint for my final implementation of nanoGPT from Andrej Karpathy. This model was trained on an RTX4090 for 48 hours.

For more details, check out my GitHub repository

Usage:

import torch
from transformers import GPT2Tokenizer
from gpt2 import GPT,GPTConfig
tokenizer = GPT2Tokenizer.from_pretrained('gpt2')

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device = torch.device("mps" if torch.backends.mps.is_available() else "cpu")


model=GPT.from_pretrained('nanoGPT2-124m.pth')
model=model.to(device)
prompt = "Hello, I'm a mathematics teacher"
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
generated_ids=model.generate(input_ids.to(device))
print(tokenizer.decode(generated_ids))