--- base_model: - deepcogito/cogito-v1-preview-qwen-32B --- vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.900|± |0.0190| | | |strict-match | 5|exact_match|↑ |0.948|± |0.0141| vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.894|± |0.0138| | | |strict-match | 5|exact_match|↑ |0.930|± |0.0114| vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.8947|± |0.0175| | - humanities | 2|none | |acc |↑ |0.9231|± |0.0308| | - other | 2|none | |acc |↑ |0.8769|± |0.0407| | - social sciences| 2|none | |acc |↑ |0.9167|± |0.0354| | - stem | 2|none | |acc |↑ |0.8737|± |0.0324| vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 250.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.924|± |0.0168| | | |strict-match | 5|exact_match|↑ |0.936|± |0.0155| vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 500.0, num_fewshot: 5, batch_size: auto |Tasks|Version| Filter |n-shot| Metric | |Value| |Stderr| |-----|------:|----------------|-----:|-----------|---|----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.920|± |0.0121| | | |strict-match | 5|exact_match|↑ |0.934|± |0.0111| vllm (pretrained=/root/autodl-tmp/cogito-v1-preview-qwen-32B-awq,add_bos_token=true,max_model_len=3096,dtype=bfloat16,tensor_parallel_size=4), gen_kwargs: (None), limit: 5.0, num_fewshot: None, batch_size: 1 | Groups |Version|Filter|n-shot|Metric| |Value | |Stderr| |------------------|------:|------|------|------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.8982|± |0.0170| | - humanities | 2|none | |acc |↑ |0.8769|± |0.0377| | - other | 2|none | |acc |↑ |0.8769|± |0.0407| | - social sciences| 2|none | |acc |↑ |0.9500|± |0.0289| | - stem | 2|none | |acc |↑ |0.8947|± |0.0288|