InternVL3-2B-Int8

This version of InternVL3-2B has been converted to run on the Axera NPU using w8a16 quantization.

This model has been optimized with the following LoRA:

Compatible with Pulsar2 version: 3.4

Convert tools links:

For those who are interested in model conversion, you can try to export axmodel through the original repo: https://huggingface.co/deepseek-ai/InternVL3-2B

Support Platform

AX650
- M4N-Dock(爱芯派Pro)

chips	Image num	image encoder 448	ttft	w8a16
AX650N	0	0 ms	221 ms (128 tokens)	11.50 tokens/sec
AX650N	1	364 ms	862 ms (384 tokens)	11.50 tokens/sec
AX650N	4	1456 ms	4589 ms (1152 tokens)	11.50 tokens/sec
AX650N	8	2912 ms	13904 ms (2176 tokens)	11.50 tokens/sec

How to use

Download all files from this repository to the device.

Using AX650 Board

root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # tree -L 1
.
├── config.json
├── examples
├── infer.py
├── infer_video.py
├── internvl3_2b_axmodel
├── internvl3_2b_tokenizer
├── README.md
└── vit_axmodel

4 directories, 4 files

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Text Generation

input text:

Please calculate the derivative of the function [y=2x^ 2-2] and provide the reasoning process in markdown format.

log information:

root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --question "Please calculate the derivative of the function [y=2x^ 2-2] and provide the reasoning process in markdown format"
Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 28/28 [00:16<00:00,  1.74it/s]
model load done!
prefill token_len:  85
slice_indexs is [0]
slice prefill done 0
Decode:   9%|██████▎                                                               | 232/2559 [00:19<05:14,  7.39it/s]
Decode:  17%|████████████                                                          | 440/2559 [00:48<04:51,  7.26it/s]hit eos!
Decode:  17%|████████████                                                          | 440/2559 [00:48<03:53,  9.06it/s]
Certainly! Let's calculate the derivative of the function \( y = 2x^2 - 2 \ \ using the rules of differentiation.

### Step-by-Step Reasoning:

1. **Identify the Function:**
   The given function is \( y = 2x^2 - 2 \\).

2. **Differentiate Term by Term:**
   We will differentiate each term of the function separately.

   - **First Term: \( 2x^2 \ \**
     - The derivative of \( x^n \ \ (where n is a constant) is \( nx^{n-1} \ \).
     - Here, \( n = 2 \ \.
     - Therefore, the derivative of \( 2x^2 \ \ is \( 2 \ \ times \( 2x^{2-1} \ \ which simplifies to \( 4x \ \.

   - **Second Term: \( -2 \ \**
     - The derivative of a constant (a term without \( x \\ is 0 \.
     - Therefore, the derivative of \( -2 \ \ is \( 0 \.

3. **Combine the Derivatives:**
   - The derivative of the entire function is the sum of the derivatives of each term.
   - So, the derivative of \( y = 2x^2 - 2 \\ is \( 4x + 0 \\ which simplifies to \( 4x \.

### Final Answer:
The derivative of the function \( y = 2x^2 - 2 \ is \( 4x \.

### Summary:
The derivative of \( y = 2x^2 - 2 \ is \( 4x \.

Multimodal Understanding

input image

input text:

"Please describe this picture in detail."

log information:

root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --question "Please describe this picture in detail" -i examples/image_1.jpg --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel
[INFO] Available providers:  ['AxEngineExecutionProvider']
Init InferenceSession:   0%|                                                                   | 0/24 [00:00<?, ?it/s][INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.11.0a
Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 28/28 [00:14<00:00,  1.92it/s]
model load done!
prefill token_len:  325
slice_indexs is [0, 1, 2]
slice prefill done 0
slice prefill done 1
slice prefill done 2
Decode:  13%|████████▋                                                           | 326/2559 [00:00<00:01, 1829.15it/s]
Decode:  19%|█████████████▍                                                        | 489/2559 [00:22<02:26, 14.17it/s]hit eos!
Decode:  20%|██████████████▏                                                       | 517/2559 [00:26<01:43, 19.71it/s]
**Image Description:**

The image depicts a giant panda in a naturalistic enclosure, likely within a zoo or wildlife sanctuary. The panda is prominently positioned in the foreground, surrounded by lush green bamboo plants. Its distinctive black and white fur is clearly visible,

with the panda's face, ears, and limbs being black, while its body and the rest of its face are white. The panda appears to be eating bamboo, with its front paws holding a piece of bamboo close to its mouth. The panda's expression is calm and curious, with its eyes looking directly at the camera.

In the background, there is another panda partially obscured by the foliage and a wooden structure, possibly part of the enclosure's design. The ground is covered with a layer of mulch or wood chips, providing a naturalistic habitat for the pandas. The overall setting is serene and well-maintained,

designed to mimic the panda's natural habitat while ensuring the animals' well-being.

input video

https://github.com/user-attachments/assets/2beffc73-d078-4c54-8282-7b7d845f39c9

input text:

"Please describe this video in detail."

log information:

root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer_video.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel -i examples/red-panda.mp4  -q "Please describe this video in detail."
[INFO] Available providers:  ['AxEngineExecutionProvider']
输入帧数: 8
preprocess image done!
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.11.0a
vit_output.shape is (1, 256, 1536), vit feature extract done!
Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 28/28 [00:30<00:00,  1.07s/it]
model load done!
prefill token_len:  2159
slice_indexs is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
slice prefill done 0
slice prefill done 1
slice prefill done 2
slice prefill done 3
slice prefill done 4
slice prefill done 5
slice prefill done 6
slice prefill done 7
slice prefill done 8
slice prefill done 9
slice prefill done 10
slice prefill done 11
slice prefill done 12
slice prefill done 13
slice prefill done 14
slice prefill done 15
slice prefill done 16
Decode:  88%|███████████████████████████████████████████████████████████▌        | 2240/2559 [00:11<00:02, 133.83it/s]^@hit eos!
Decode:  90%|█████████████████████████████████████████████████████████████▏      | 2303/2559 [00:21<00:02, 108.19it/s]
The video features two red pandas in an outdoor enclosure with green grass and a wooden structure. One panda is perched on a branch, while the other stands on the ground. The standing panda is holding a bamboo stick with its paws, attempting to eat it. The environment appears to be a zoo or wildlife sanctuary. The lighting is natural daylight. The pandas have distinctive reddish-brown fur with black faces and white markings around their eyes. The bamboo sticks are brown and appear to be part of the enclosure's enrichment. The panda on the ground seems to be trying to reach the bamboo, while the one on the branch seems to be observing or waiting for its turn. There's no visible human interaction in these frames.

AXERA-TECH
/

InternVL3-2B

InternVL3-2B-Int8

Convert tools links:

Support Platform

How to use

Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board

Model tree for AXERA-TECH/InternVL3-2B