InternVL3-2B-Int8
This version of InternVL3-2B has been converted to run on the Axera NPU using w8a16 quantization.
This model has been optimized with the following LoRA:
Compatible with Pulsar2 version: 3.4
Convert tools links:
For those who are interested in model conversion, you can try to export axmodel through the original repo: https://huggingface.co/deepseek-ai/InternVL3-2B
Support Platform
- AX650
chips | Image num | image encoder 448 | ttft | w8a16 |
---|---|---|---|---|
AX650N | 0 | 0 ms | 221 ms (128 tokens) | 11.50 tokens/sec |
AX650N | 1 | 364 ms | 862 ms (384 tokens) | 11.50 tokens/sec |
AX650N | 4 | 1456 ms | 4589 ms (1152 tokens) | 11.50 tokens/sec |
AX650N | 8 | 2912 ms | 13904 ms (2176 tokens) | 11.50 tokens/sec |
How to use
Download all files from this repository to the device.
Using AX650 Board
root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # tree -L 1
.
βββ config.json
βββ examples
βββ infer.py
βββ infer_video.py
βββ internvl3_2b_axmodel
βββ internvl3_2b_tokenizer
βββ README.md
βββ vit_axmodel
4 directories, 4 files
Inference with AX650 Host, such as M4N-Dock(η±θ―ζ΄ΎPro) or AX650N DEMO Board
Text Generation
input text:
Please calculate the derivative of the function [y=2x^ 2-2] and provide the reasoning process in markdown format.
log information:
root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --question "Please calculate the derivative of the function [y=2x^ 2-2] and provide the reasoning process in markdown format"
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 28/28 [00:16<00:00, 1.74it/s]
model load done!
prefill token_len: 85
slice_indexs is [0]
slice prefill done 0
Decode: 9%|βββββββ | 232/2559 [00:19<05:14, 7.39it/s]
Decode: 17%|ββββββββββββ | 440/2559 [00:48<04:51, 7.26it/s]hit eos!
Decode: 17%|ββββββββββββ | 440/2559 [00:48<03:53, 9.06it/s]
Certainly! Let's calculate the derivative of the function \( y = 2x^2 - 2 \ \ using the rules of differentiation.
### Step-by-Step Reasoning:
1. **Identify the Function:**
The given function is \( y = 2x^2 - 2 \\).
2. **Differentiate Term by Term:**
We will differentiate each term of the function separately.
- **First Term: \( 2x^2 \ \**
- The derivative of \( x^n \ \ (where n is a constant) is \( nx^{n-1} \ \).
- Here, \( n = 2 \ \.
- Therefore, the derivative of \( 2x^2 \ \ is \( 2 \ \ times \( 2x^{2-1} \ \ which simplifies to \( 4x \ \.
- **Second Term: \( -2 \ \**
- The derivative of a constant (a term without \( x \\ is 0 \.
- Therefore, the derivative of \( -2 \ \ is \( 0 \.
3. **Combine the Derivatives:**
- The derivative of the entire function is the sum of the derivatives of each term.
- So, the derivative of \( y = 2x^2 - 2 \\ is \( 4x + 0 \\ which simplifies to \( 4x \.
### Final Answer:
The derivative of the function \( y = 2x^2 - 2 \ is \( 4x \.
### Summary:
The derivative of \( y = 2x^2 - 2 \ is \( 4x \.
Multimodal Understanding
input image
input text:
"Please describe this picture in detail."
log information:
root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --question "Please describe this picture in detail" -i examples/image_1.jpg --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel
[INFO] Available providers: ['AxEngineExecutionProvider']
Init InferenceSession: 0%| | 0/24 [00:00<?, ?it/s][INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.11.0a
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 28/28 [00:14<00:00, 1.92it/s]
model load done!
prefill token_len: 325
slice_indexs is [0, 1, 2]
slice prefill done 0
slice prefill done 1
slice prefill done 2
Decode: 13%|βββββββββ | 326/2559 [00:00<00:01, 1829.15it/s]
Decode: 19%|ββββββββββββββ | 489/2559 [00:22<02:26, 14.17it/s]hit eos!
Decode: 20%|βββββββββββββββ | 517/2559 [00:26<01:43, 19.71it/s]
**Image Description:**
The image depicts a giant panda in a naturalistic enclosure, likely within a zoo or wildlife sanctuary. The panda is prominently positioned in the foreground, surrounded by lush green bamboo plants. Its distinctive black and white fur is clearly visible,
with the panda's face, ears, and limbs being black, while its body and the rest of its face are white. The panda appears to be eating bamboo, with its front paws holding a piece of bamboo close to its mouth. The panda's expression is calm and curious, with its eyes looking directly at the camera.
In the background, there is another panda partially obscured by the foliage and a wooden structure, possibly part of the enclosure's design. The ground is covered with a layer of mulch or wood chips, providing a naturalistic habitat for the pandas. The overall setting is serene and well-maintained,
designed to mimic the panda's natural habitat while ensuring the animals' well-being.
input video
https://github.com/user-attachments/assets/2beffc73-d078-4c54-8282-7b7d845f39c9
input text:
"Please describe this video in detail."
log information:
root@ax650 ~/yongqiang/push_hugging_face/InternVL3-2B # python3 infer_video.py --hf_model internvl3_2b_tokenizer/ --axmodel_path internvl3_2b_axmodel/ --vit_model vit_axmodel/internvl3_2b_vit_slim.axmodel -i examples/red-panda.mp4 -q "Please describe this video in detail."
[INFO] Available providers: ['AxEngineExecutionProvider']
θΎε
₯εΈ§ζ°: 8
preprocess image done!
[INFO] Chip type: ChipType.MC50
[INFO] VNPU type: VNPUType.DISABLED
[INFO] Engine version: 2.11.0a
vit_output.shape is (1, 256, 1536), vit feature extract done!
Init InferenceSession: 100%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 28/28 [00:30<00:00, 1.07s/it]
model load done!
prefill token_len: 2159
slice_indexs is [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16]
slice prefill done 0
slice prefill done 1
slice prefill done 2
slice prefill done 3
slice prefill done 4
slice prefill done 5
slice prefill done 6
slice prefill done 7
slice prefill done 8
slice prefill done 9
slice prefill done 10
slice prefill done 11
slice prefill done 12
slice prefill done 13
slice prefill done 14
slice prefill done 15
slice prefill done 16
Decode: 88%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2240/2559 [00:11<00:02, 133.83it/s]^@hit eos!
Decode: 90%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 2303/2559 [00:21<00:02, 108.19it/s]
The video features two red pandas in an outdoor enclosure with green grass and a wooden structure. One panda is perched on a branch, while the other stands on the ground. The standing panda is holding a bamboo stick with its paws, attempting to eat it. The environment appears to be a zoo or wildlife sanctuary. The lighting is natural daylight. The pandas have distinctive reddish-brown fur with black faces and white markings around their eyes. The bamboo sticks are brown and appear to be part of the enclosure's enrichment. The panda on the ground seems to be trying to reach the bamboo, while the one on the branch seems to be observing or waiting for its turn. There's no visible human interaction in these frames.
- Downloads last month
- 0
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
HF Inference deployability: The model has no library tag.
Model tree for AXERA-TECH/InternVL3-2B
Base model
OpenGVLab/InternVL3-2B-Pretrained
Finetuned
OpenGVLab/InternVL3-2B-Instruct
Finetuned
OpenGVLab/InternVL3-2B