--- license: mit language: - en - zh base_model: - deepseek-ai/Janus-Pro-1B pipeline_tag: visual-question-answering tags: - DeepSeek - Janus-Pro-1B --- # Janus-Pro-1B-Int8 This version of Janus-Pro-1B has been converted to run on the Axera NPU using **w8a16** quantization. This model has been optimized with the following LoRA: Compatible with Pulsar2 version: 3.4 ## Convert tools links: For those who are interested in model conversion, you can try to export axmodel through the original repo : https://huggingface.co/deepseek-ai/Janus-Pro-1B - [Github for Janus-Pro-1B.axera](https://github.com/AXERA-TECH/Janus-Pro-1B.axera) - [Pulsar2 Link, How to Convert LLM from Huggingface to axmodel](https://pulsar2-docs.readthedocs.io/en/latest/appendix/build_llm.html) ## Support Platform - AX650 - [M4N-Dock(爱芯派Pro)](https://wiki.sipeed.com/hardware/zh/maixIV/m4ndock/m4ndock.html) |chips|image encoder 384 | ttft | w8a16 | |--|--|--|--| |AX650| 142.682 ms | 4560.214 ms | 11.43 tokens/sec| ## How to use Download all files from this repository to the device. **Using AX650 Board** ```bash root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # tree -L 1 . ├── assets ├── config.json ├── embeds ├── img_gen_onnx ├── imgs ├── infer_axmodel_gen.py ├── infer_axmodel_und.py ├── janus_pro_1b_axmodel ├── janus_pro_1b_tokenizer ├── README.md └── vit_axmodel 8 directories, 3 files ``` #### Install janus ```bash $ git clone https://github.com/deepseek-ai/Janus $ cd Janus $ pip3 install -e . ``` #### Inference with AX650 Host, such as M4N-Dock(爱芯派Pro) or AX650N DEMO Board **Multimodal Understanding** input text: ``` Please describe the picture. ``` - input image ![](imgs/image.png) log information: ```bash root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # python3 infer_axmodel_und.py --tokenizer_dir janus_pro_1b_tokenizer --axmodel_path janus_pro_1b_axmodel --vit_axmodel_path vit_axmodel/janus_warp_vit.axmodel -i ./imgs/image.png [INFO] Available providers: ['AxEngineExecutionProvider'] [INFO] Chip type: ChipType.MC50 [INFO] VNPU type: VNPUType.DISABLED [INFO] Engine version: 2.11.0a vit_output.shape is (1, 576, 2048), vit feature extract done! Init InferenceSession: 100%|██████████████████████████████████████████████████████████| 24/24 [00:04<00:00, 4.94it/s] model load done! prefill done! Decoder: 62%|█████████████████████████████████████████▍ | 634/1024 [00:00<00:00, 2505.28it/s]Decoder: 72%|█████████████████████████████████████████████████▉ | 741/1024 [00:19<00:10, 27.69it/s]hit eos! Decoder: 74%|███████████████████████████████████████████████████▎ | 762/1024 [00:23<00:08, 31.84it/s] Janus Answers: The image depicts three astronauts standing in a lush, green forest. They are wearing traditional white space suits with various patches and equipment attached. The suits have a reflective visor on their helmets, and they appear to be in a relaxed pose, with one astronaut raising his arms and the others standing or crouching. The forest is dense with tall trees and dense foliage, creating a serene and somewhat mysterious atmosphere. ``` **Text-to-Image Generation** input text: ``` "A close-up high-contrast photo of Sydney Opera House sitting next to Eiffel tower, under a blue night sky of roiling energy, exploding yellow stars, and radiating swirls of blue." ``` log information: ```bash root@ax650 ~/yongqiang/push_hugging_face/Janus-Pro-1B # python3 infer_axmodel_gen.py --tokenizer_dir janus_pro_1b_tokenizer/ --axmodel_path janus_pro_1b_axmodel/ [INFO] Available providers: ['AxEngineExecutionProvider'] Init InferenceSession: 0%| | 0/24 [00:00:269 - model load done! 2025-04-14 15:55:33.104 | DEBUG | __main__:generate:158 - prefill completed! ImageToken: 18%|████████████ | 104/575 [00:39<02:58, 2.64it/s]ImageToken: 45%|██████████████████████████████▍ | 261/575 [01:39<01:58, 2.65it/s]ImageToken: 73%|████████████████████████████████████████████████▊ | 419/575 [02:39<00:58, 2.66it/s]ImageToken: 100%|███████████████████████████████████████████████████████████████████| 575/575 [03:38<00:00, 2.63it/s] ``` output image ![](assets/gen_out_img.jpg)