Intel/Qwen3-30B-A3B-int4-AutoRound-inc

Model Details

This model is an int4 model with group_size 128 and symmetric quantization of Qwen/Qwen3-30B-A3B generated by intel/auto-round.

How To Use

INT4 Inference(CPU/CUDA/INTEL GPU)

from transformers import AutoModelForCausalLM,AutoTokenizer
quantized_model_dir = "Intel/Qwen3-30B-A3B-int4-AutoRound-inc"

# load the tokenizer and the model
tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)
model = AutoModelForCausalLM.from_pretrained(
    quantized_model_dir,
    torch_dtype="auto",
    device_map="auto"
)

# prepare the model input
prompt = "Give me a short introduction to large language model."
messages = [
    {"role": "user", "content": prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True,
    enable_thinking=True # Switches between thinking and non-thinking modes. Default is True.
)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# conduct text completion
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512,  ##change this to align with the official usage
    do_sample=False  ##change this to align with the official usage
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist() 

# parsing thinking content
try:
    # rindex finding 151668 (</think>)
    index = len(output_ids) - output_ids[::-1].index(151668)
except ValueError:
    index = 0

thinking_content = tokenizer.decode(output_ids[:index], skip_special_tokens=True).strip("\n")
content = tokenizer.decode(output_ids[index:], skip_special_tokens=True).strip("\n")

print("thinking content:", thinking_content)
print("content:", content)
##INT4:
# thinking content: <think>
# Okay, the user is asking for a short introduction to large language models. Let me start by defining what they are. I should mention that they're AI systems trained on vast amounts of text data. Then, I need to explain their purpose, like generating human-like text, answering questions, etc.
# I should highlight their key features: large size, which means they have a lot of parameters, and the training data, which is diverse. Maybe mention that they can perform various tasks without needing specific training for each one. Also, it's important to note that they're based on deep learning, specifically neural networks.
# I should also touch on their applications, like in chatbots, content creation, and data analysis. But I need to keep it concise. Maybe mention some examples, like GPT or BERT, but not too detailed. Also, a bit about their limitations, like potential biases or errors, but since it's a short intro, maybe just a brief mention.
# Wait, the user said "short introduction," so I need to be concise. Avoid going into too much technical detail. Make sure the language is simple and accessible. Check for any jargon that might need simplifying. Let me structure it: definition, how they work, key features, applications, and a note on their impact. That should cover it without being too lengthy.
# </think>
# content: A **large language model (LLM)** is an advanced artificial intelligence system designed to understand and generate human-like text by analyzing vast amounts of data. Trained on extensive text corpora, these models learn patterns, grammar, and context to perform tasks like answering questions, writing essays, coding, or even creating art. Their "large" scale refers to the massive number of parameters (millions or billions) that enable complex language understanding. LLMs are built using deep learning techniques, such as transformer architectures, and can adapt to diverse tasks without needing specific training for each one. They power applications like chatbots, virtual assistants, and content creation tools, revolutionizing how humans interact with technology. However, they also raise ethical considerations, such as bias and misinformation, requiring careful oversight.

##BF16:
# thinking content: <think>
# Okay, the user is asking for a short introduction to large language models. Let me start by recalling what I know about them. Large language models, or LLMs, are a type of AI that's trained on vast amounts of text data. They can generate human-like text, answer questions, and perform various language tasks.
# I should mention their size, like the number of parameters, which is a key factor. Maybe explain that they're built using deep learning, specifically neural networks. Also, they're trained on diverse data, which helps them understand different topics and languages.
# Applications are important too. They're used in chatbots, content creation, translation, and more. But I should also note some challenges, like the need for large computational resources and potential issues with bias or misinformation.
# Wait, the user might be a student or someone new to AI. I should keep it simple and avoid jargon. Maybe start with a definition, then key features, applications, and a note on challenges. Make sure it's concise but covers the essentials. Let me check if I'm missing anything. Oh, maybe mention that they can understand context and generate coherent responses. Also, examples like GPT or BERT could be helpful, but since the user asked for a short intro, maybe just refer to them as examples without going into detail. Alright, that should cover it.
# </think>
# content: A **large language model (LLM)** is an advanced artificial intelligence system trained on vast amounts of text data to understand and generate human-like language. These models use deep learning techniques, particularly neural networks, to analyze patterns in text, enabling them to perform tasks like answering questions, writing essays, translating languages, and even coding. Their "large" scale refers to the massive number of parameters (settings) they contain, allowing them to capture complex linguistic structures and context. LLMs like GPT or BERT are widely used in applications such as chatbots, content creation, and data analysis, though they also raise considerations around bias, ethics, and computational resources.

prompt = "9.11和9.8哪个数字大"
##INT4:
# thinking content: 
# content: <think>
# 嗯，用户问的是9.11和9.8哪个数字大。首先，我需要确认这两个数字的结构。9.11和9.8都是小数，对吧？不过可能用户写的时候有没有什么问题？比如9.11是不是可能被误解为9.11，而9.8是9.80？不过通常来说，小数点后的位数不同的话，应该按照数值大小来比较。
# 首先，我应该把这两个数都转换成相同的小数位数，或者直接比较它们的数值。比如，9.11和9.8，可以看成是9.11和9.80。这时候，比较整数部分都是9，所以要看小数部分。小数部分的话，第一位是1和8，对吧？因为9.11的小数部分是0.11，而9.80的小数部分是0.80。这时候，0.80比0.11大，所以9.80比9.11大，也就是9.8比9.11大。
# 不过，可能用户会疑惑，为什么小数点后第二位的11和80比较？或者有没有可能用户把9.11写成9.11，而9.8是9.8，这时候可能需要更仔细地分析。比如，9.8其实可以看作9.80，而9.11是9.11，所以比较的话，小数点后第一位是8和1，显然8比1大，所以9.8更大。
# 不过，也有可能用户对小数的比较不太熟悉，可能需要更详细的解释。比如，先比较整数部分，如果整数部分相同，再比较小数部分。这里整数部分都是9，所以继续比较小数部分。小数部分的话，第一位是1和8，所以直接比较第一位，8比1大，所以9.8更大。即使9.11的小数部分有两位，但第一位已经决定了大小，后面的位数不需要再比较了。
# 不过，也有可能用户会误以为9.11的小数部分是11，而9.8的小数部分是8，所以可能认为11比8大？但其实小数点后的每一位都是单独的位数，第一位是十分位，第二位是百分位。

##BF16:
# thinking content: 
# content: <think>
# Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberry". Hmm, first I need to make sure I spell the word correctly. Strawberry... S-T-R-A-W-B-E-R-R-Y. Wait, is that right? Let me check again. S-T-R-A-W-B-E-R-R-Y. Yeah, that's correct. Now, I need to count the number of 'r's.
# Let me break it down letter by letter. Starting from the beginning:
# 1. S
# 2. T
# 3. R
# 4. A
# 5. W
# 6. B
# 7. E
# 8. R
# 9. R
# 10. Y
# So, the letters are S, T, R, A, W, B, E, R, R, Y. Now, looking for 'r's. The third letter is R, then the eighth is R, and the ninth is also R. So that's three 'r's? Wait, let me count again. Third letter: R (1), then the eighth: R (2), ninth: R (3). So three 'r's in total. But wait, sometimes people might miss a letter. Let me write them out:
# Position 3: R
# Position 8: R
# Position 9: R
# Yes, that's three. But wait, sometimes when people write "strawberry", they might have a different spelling? No, I think that's the standard. Let me confirm the spelling. Strawberry is spelled S-T-R-A-W-B-E-R-R-Y. So yes, the 'r's are at positions 3, 8, and 9. So three 'r's. But wait, maybe I'm miscounting. Let me write the word again:
# S T R A W B E R R Y
# Breaking it down:
# S (1)
# T (2)
# R (3)
# A (4)
# W (5)
# B (6)
# E (7)
# R (8)
# R (9)
# Y (10)
# So positions 3, 8, and 9 are 'r's. That's three. So the answer should be 3. But I want to make sure I'm not missing any. Let me check another way. Maybe write the word and underline the 'r's:
# S T **R** A W B E **R** **R** Y
# Yes, three

prompt = "How many r in word strawberry"
##INT4:
# thinking content: <think>
# Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberry". Hmm, I need to check each letter one by one.
# First, I'll write out the word: S-T-R-A-W-B-E-R-R-Y. Let me count each letter. Starting from the beginning:
# S - that's the first letter, not an 'r'.
# T - second, also not.
# R - third letter, that's one 'r'.
# A - fourth, nope.
# W - fifth, no.
# B - sixth, no.
# E - seventh, no.
# R - eighth, that's the second 'r'.
# R - ninth, third 'r'.
# Y - tenth, no.
# Wait, so that's three 'r's? Let me double-check. S-T-R-A-W-B-E-R-R-Y. Yes, the third letter is R, then the eighth and ninth letters are both R. So that's three 'r's in total. But wait, sometimes people might miss a letter. Let me write it again:
# S (1), T (2), R (3), A (4), W (5), B (6), E (7), R (8), R (9), Y (10). So positions 3, 8, and 9. That's three 'r's. So the answer should be 3. But maybe I should check if there's any other 'r' I missed. Let me spell the word again: S-T-R-A-W-B-E-R-R-Y. No, there's no other 'r' in there. So the answer is three.
# </think>
# content: The word **"strawberry"** contains **3** instances of the letter **"r"**. 
# Here's the breakdown:
# - **S**  
# - **T**  
# - **R** (1st **r**)  
# - **A**  
# - **W**  
# - **B**  
# - **E**  
# - **R** (2nd **r**)  
# - **R** (3rd **r**)  
# - **Y**  
# **Answer:** 3.

##BF16:
# thinking content: 
# content: <think>
# Okay, let's see. The user is asking how many times the letter 'r' appears in the word "strawberry". Hmm, first I need to make sure I have the correct spelling of the word. Let me write it out: S-T-R-A-W-B-E-R-R-Y. Wait, is that right? Let me check again. Strawberry... yes, that's correct. Now, I need to count the number of 'r's in that spelling.

# Let me go through each letter one by one. Starting with the first letter: S. Not an 'r'. Next is T. Still not. Then R. Okay, that's one. Then A, W, B, E. So far, only one 'r'. Then comes the next letters: R. That's the second 'r'. Then another R. Wait, is there a third 'r'? Let me check again. The word is S-T-R-A-W-B-E-R-R-Y. So after the first R, there's a B, E, then two R's. So that's two R's? Wait, no. Let me count again. Let's break it down:

# 1. S
# 2. T
# 3. R (1st)
# 4. A
# 5. W
# 6. B
# 7. E
# 8. R (2nd)
# 9. R (3rd)
# 10. Y

# Wait, so the letters are S, T, R, A, W, B, E, R, R, Y. So the 'r's are at positions 3, 8, and 9. That's three 'r's. But wait, maybe I miscounted. Let me write it out again:

# S-T-R-A-W-B-E-R-R-Y. So after the first R (position 3), then the next letters are A, W, B, E, then R (position 8), then another R (position 9). So that's three R's. But sometimes people might miss the second R. Let me check again. The word is strawberry. Let me spell it again: S-T-R-A-W-B-E-R-R-Y. Yes, that's correct. So the R's are in the third, eighth, and ninth positions. Therefore, there are three 'r's. Wait, but I thought maybe the answer was two. Maybe I need to confirm. Let me check an online source or think of another way. Alternatively, maybe


prompt = "请简短介绍一下阿里巴巴公司"
##INT4:
# thinking content: <think>
# 好的，用户让我简短介绍一下阿里巴巴公司。首先，我需要确定用户的需求是什么。可能他们需要一个快速的概述，所以得保持简洁。阿里巴巴成立于1999年，由马云和其他团队创立，这点很重要，必须提到。
# 接下来，公司主要的业务是什么？淘宝、天猫这些电商平台是核心，还有阿里云，这是他们的重要部分。可能用户对云计算也感兴趣，所以得包括进去。另外，支付宝作为支付工具，也是阿里巴巴生态系统的一部分，应该提到。
# 然后，公司的影响力和地位。作为全球最大的电商平台之一，可能需要强调其规模和市场地位。还有，阿里巴巴的子公司和业务扩展，比如菜鸟网络、阿里健康等，但可能用户只需要简短，所以可能不需要展开。
# 用户可能想知道阿里巴巴的总部在哪里，但问题中没有提到，所以可能不需要。不过，如果有必要，可以加上。另外，成立年份和创始人是关键信息，必须准确。
# 需要检查是否有遗漏的重要信息，比如上市情况，阿里巴巴在纽交所上市，但可能用户不需要这么详细。保持简短的话，可能不需要。最后，确保语言流畅，信息准确，没有错误。
# </think>
# content: 阿里巴巴集团（Alibaba Group）是全球领先的电子商务和科技公司，成立于1999年，由马云及其团队在中国杭州创立。其核心业务包括电子商务平台（如淘宝、天猫）、云计算服务（阿里云）、数字支付（支付宝）、物流（菜鸟网络）及金融科技等。阿里巴巴通过创新和技术驱动，已成为中国乃至全球最具影响力的互联网企业之一。

##BF16:
# thinking content: <think>
# 好的，用户让我简短介绍一下阿里巴巴公司。首先，我需要确定用户的需求是什么。他们可能对阿里巴巴不太了解，或者需要快速了解其基本情况。简短介绍的话，应该包括成立时间、创始人、总部地点、主要业务和影响力。
# 接下来，我得确认阿里巴巴的成立时间。我记得是1999年，由马云和他的团队创立。总部在杭州，这点没错。主要业务方面，阿里巴巴集团现在涵盖电商、金融、物流、云计算等多个领域。比如淘宝、天猫是电商平台，蚂蚁金服是金融，菜鸟网络是物流，阿里云是云计算。这些都需要提到。
# 另外，可能需要提到它的全球影响力，比如成为全球最大的电商平台之一，以及对中国经济的贡献。不过要保持简短，不能太详细。还要注意不要遗漏重要子公司或业务板块，但也不能太冗长。
# 用户可能还想知道阿里巴巴的上市情况，比如在纽交所上市的时间，但可能不需要太详细，因为问题只是简短介绍。另外，是否需要提到最近的动态？比如最近的业务调整或者新项目？不过简短介绍的话，可能不需要，保持基础信息即可。
# 需要检查是否有错误信息，比如成立年份是否正确，创始人是否正确，总部是否在杭州。确认无误后，组织语言，确保简洁明了，不超过两到三句话。可能还需要提到其子公司和业务范围，但用简短的列举方式，比如“涵盖电子商务、金融、物流、云计算等领域”。
# </think>
# content: 阿里巴巴集团（Alibaba Group）成立于1999年，由马云及其团队在杭州创立，是全球领先的电子商务和科技公司。其核心业务包括电商平台（如淘宝、天猫）、金融科技（蚂蚁集团）、物流（菜鸟网络）及云计算（阿里云）等，致力于通过数字技术推动全球商业发展，已成为中国最具影响力的互联网企业之一。

Generate the model

Here is the sample command to generate the model.

auto-round-best \
--model Qwen/Qwen3-30B-A3B \
--device 0 \
--group_size 128 \
--bits 4 \
--format 'auto_round' \
--output_dir "./tmp_autoround"

Ethical Considerations and Limitations

The model can produce factually incorrect output, and should not be relied on to produce factually accurate information. Because of the limitations of the pretrained model and the finetuning datasets, it is possible that this model could generate lewd, biased or otherwise offensive outputs.

Therefore, before deploying any applications of the model, developers should perform safety testing.

Caveats and Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model.

Here are a couple of useful links to learn more about Intel's AI software:

Intel Neural Compressor link

Disclaimer

The license on this model does not constitute legal advice. We are not responsible for the actions of third parties who use this model. Please consult an attorney before using this model for commercial purposes.

Cite

@article{cheng2023optimize, title={Optimize weight rounding via signed gradient descent for the quantization of llms}, author={Cheng, Wenhua and Zhang, Weiwei and Shen, Haihao and Cai, Yiyang and He, Xin and Lv, Kaokao and Liu, Yi}, journal={arXiv preprint arXiv:2309.05516}, year={2023} }

arxiv github

Intel
/

Qwen3-30B-A3B-int4-AutoRound-inc