noneUsername's picture
Create README.md
a52650a verified
metadata
base_model:
  - deepcogito/cogito-v1-preview-qwen-32B

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.900 ± 0.0190
strict-match 5 exact_match 0.948 ± 0.0141

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.894 ± 0.0138
strict-match 5 exact_match 0.930 ± 0.0114

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.8947 ± 0.0175
- humanities 2 none acc 0.9231 ± 0.0308
- other 2 none acc 0.8769 ± 0.0407
- social sciences 2 none acc 0.9167 ± 0.0354
- stem 2 none acc 0.8737 ± 0.0324

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.924 ± 0.0168
strict-match 5 exact_match 0.936 ± 0.0155

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.920 ± 0.0121
strict-match 5 exact_match 0.934 ± 0.0111

vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1

Groups Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.8982 ± 0.0170
- humanities 2 none acc 0.8769 ± 0.0377
- other 2 none acc 0.8769 ± 0.0407
- social sciences 2 none acc 0.9500 ± 0.0289
- stem 2 none acc 0.8947 ± 0.0288