SmolGRPO-135M

This is a fine-tune of HuggingFaceTB/SmolLM2-135M-Instruct using GRPO on mlabonne/smoltldr (2k samples). It is designed to summarize Reddit posts using ~50 characters.

You can reproduce this training using this colab notebook. It takes about 40 minutes to train the model.

Takeaways from these experiments:

  • Adding a system prompt like "Summarize the following text concisely" doesn't help
  • You can get faster convergence with a higher learning rate and fewer samples, but it's prone to overshooting your target.
  • I tried many reward functions to play with reward shaping but it didn't seem to help

Example

Input:

SUBREDDIT: r/Advice

TITLE: I have big dreams and goals, but they are kind of cloudy.[m20]

POST: I live at home with my family right now and I don't go to school. I went to college for a year and decided to stop. My best friend convinced me that I don't need college to do what I want to do. Besides, I hated taking classes I wasn't interested in. The things I want to do in life (I know it seems like too much) included producing music, making a cartoon, making comics, designing clothes and shoes, and other smaller things related to that. I grew up with a good family with a father that had similar dreams. He niw has a job he's been working for 20+ years that he doesn't like. I too am afraid of falling into that path. I've been pretty down and frustrated and feeling things are quite impossible although I know there's always hope. I don't really want to do anything else, but I'm stuck. I'll be pretty down for a week, then by the next week I'm in high spirits with a game plan that always fails. I've been doing this for quite a while now. My parents are starting to get on my case now and when they ask what I'm gonna do in life I don't know how to respond. Maybe I should try looking for lessons in nyc to get me out of the house? I practice drawing and making music a lot, but I can never feel satisfied and feel like I'm moving in the right direction. Everything seems like a scary cycle of ups and downs. I have faith I can turn it around, but I just don't know how.

TL;DR:

Output:

I have big dreams and goals, but they are kind of cloudy. I'm going to try to figure out a plan to get out of that house, but I'm not sure what that plan will be.

Downloads last month
54
Safetensors
Model size
135M params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for mlabonne/SmolGRPO-135M

Quantizations
1 model