LLMJapan commited on
Commit
6d226e3
·
verified ·
1 Parent(s): ba87ec7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -17
README.md CHANGED
@@ -24,20 +24,19 @@ python convert.py \
24
  -b 8.0 \
25
  -hb 8
26
  ```
 
27
 
28
- # Model Card for OlympicCoder-7B
29
-
30
- OlympicCoder-7B is a code model that achieves strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics.
31
 
32
  * Repository: https://github.com/huggingface/open-r1
33
  * Blog post: https://huggingface.co/blog/open-r1/update-3
34
 
35
  ## Model description
36
 
37
- - **Model type:** A 7B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
38
  - **Language(s) (NLP):** Primarily English
39
  - **License:** apache-2.0
40
- - **Finetuned from model:** [Qwen/Qwen2.5-Coder-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct)
41
 
42
  ## Evaluation
43
 
@@ -45,18 +44,16 @@ OlympicCoder-7B is a code model that achieves strong performance on competitive
45
 
46
 
47
 
 
48
  ## Usage
49
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
50
 
51
  ```python
52
  # pip install transformers
53
  # pip install accelerate
54
-
55
  import torch
56
  from transformers import pipeline
57
-
58
- pipe = pipeline("text-generation", model="open-r1/OlympicCoder-7B", torch_dtype=torch.bfloat16, device_map="auto")
59
-
60
  # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
61
  messages = [
62
  {"role": "user", "content": "Write a python program to calculate the 10th Fibonacci number"},
@@ -70,22 +67,23 @@ print(outputs[0]["generated_text"])
70
  #<think>Okay, I need to write a Python program that calculates the 10th Fibonacci number. Hmm, the Fibonacci sequence starts with 0 and 1. Each subsequent number is the sum of the two preceding ones. So the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, and so on. ...
71
  ```
72
 
73
- > [!WARNING]
74
- > To ensure that the model consistently outputs a long chain-of-thought, we have edited the chat template to prefill the first assistant turn with a `<think>` token. As a result, the outputs from this model will not show the opening `<think>` token if you use the model's `generate()` method. To apply reinforcement learning with a format reward, either prepend the `<think>` token to the model's completions or amend the chat template to remove the prefill.
 
75
 
76
  ## Training procedure
77
  ### Training hyper-parameters
78
 
79
- The following hyperparameters were used during training:
80
 
81
- - dataset: open-r1/codeforces-cots
82
  - learning_rate: 4.0e-5
83
- - train_batch_size: 2
84
  - seed: 42
85
  - packing: false
86
- - distributed_type: deepspeed-zero-3
87
- - num_devices: 8
88
- - gradient_accumulation_steps: 8
89
  - total_train_batch_size: 16
90
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
91
  - lr_scheduler_type: cosine_with_min_lr
 
24
  -b 8.0 \
25
  -hb 8
26
  ```
27
+ # Model Card for OlympicCoder-32B
28
 
29
+ OlympicCoder-32B is a code mode that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench andthe 2024 International Olympiad in Informatics.
 
 
30
 
31
  * Repository: https://github.com/huggingface/open-r1
32
  * Blog post: https://huggingface.co/blog/open-r1/update-3
33
 
34
  ## Model description
35
 
36
+ - **Model type:** A 32B parameter model fine-tuned on a decontaminated version of the codeforces dataset.
37
  - **Language(s) (NLP):** Primarily English
38
  - **License:** apache-2.0
39
+ - **Finetuned from model:** [Qwen/Qwen2.5-Coder-32B-Instruct](https://huggingface.co/Qwen/Qwen2.5-Coder-32B-Instruct)
40
 
41
  ## Evaluation
42
 
 
44
 
45
 
46
 
47
+
48
  ## Usage
49
  Here's how you can run the model using the `pipeline()` function from 🤗 Transformers:
50
 
51
  ```python
52
  # pip install transformers
53
  # pip install accelerate
 
54
  import torch
55
  from transformers import pipeline
56
+ pipe = pipeline("text-generation", model="open-r1/OlympicCoder-32B", torch_dtype=torch.bfloat16, device_map="auto")
 
 
57
  # We use the tokenizer's chat template to format each message - see https://huggingface.co/docs/transformers/main/en/chat_templating
58
  messages = [
59
  {"role": "user", "content": "Write a python program to calculate the 10th Fibonacci number"},
 
67
  #<think>Okay, I need to write a Python program that calculates the 10th Fibonacci number. Hmm, the Fibonacci sequence starts with 0 and 1. Each subsequent number is the sum of the two preceding ones. So the sequence goes: 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, and so on. ...
68
  ```
69
 
70
+ > [!IMPORTANT]
71
+ > To ensure that the model consistently outputs a long chain-of-thought, we have edited the chat template to prefill the first assistant turn with a `<think>` token. As a result, the outputs from this model will not show the opening `<think>` token if you use the model's `generate()` method. To apply reinforcement learning with a format reward, either prepend the `<think>` token to the model's completions or amend the chat template to remove the prefill. Check out our [blog post](https://huggingface.co/blog/open-r1/update-3#lesson-4-prefill-with-think-to-consistently-enable-long-cot) for more details.
72
+
73
 
74
  ## Training procedure
75
  ### Training hyper-parameters
76
 
77
+ The following hyperparameters were used during training on 16 H100 nodes:
78
 
79
+ - dataset: open-r1/codeforces-cots_decontaminated
80
  - learning_rate: 4.0e-5
81
+ - train_batch_size: 1
82
  - seed: 42
83
  - packing: false
84
+ - distributed_type: fsdp
85
+ - num_devices: 128
86
+ - gradient_accumulation_steps: 1
87
  - total_train_batch_size: 16
88
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
89
  - lr_scheduler_type: cosine_with_min_lr