Files changed (3) hide show
  1. README.md +10 -101
  2. app.py +82 -0
  3. requirements.txt +6 -0
README.md CHANGED
@@ -1,105 +1,14 @@
1
- ---
2
- license: mit
3
- tags:
4
- - image-generation
5
- - HiDream.ai
6
- language:
7
- - en
8
- pipeline_tag: text-to-image
9
- library_name: diffusers
10
- ---
11
 
12
- ![HiDream-I1 Demo](demo.jpg)
13
 
14
- `HiDream-I1` is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
 
 
 
 
15
 
16
- <span style="color: #FF5733; font-weight: bold">For more features and to experience the full capabilities of our product, please visit [https://vivago.ai/](https://vivago.ai/).</span>
17
 
18
- ## Key Features
19
- - ✨ **Superior Image Quality** - Produces exceptional results across multiple styles including photorealistic, cartoon, artistic, and more. Achieves state-of-the-art HPS v2.1 score, which aligns with human preferences.
20
- - 🎯 **Best-in-Class Prompt Following** - Achieves industry-leading scores on GenEval and DPG benchmarks, outperforming all other open-source models.
21
- - 🔓 **Open Source** - Released under the MIT license to foster scientific advancement and enable creative innovation.
22
- - 💼 **Commercial-Friendly** - Generated images can be freely used for personal projects, scientific research, and commercial applications.
23
-
24
- ## Quick Start
25
- Please make sure you have installed [Flash Attention](https://github.com/Dao-AILab/flash-attention). We recommend CUDA version 12.4 for the manual installation.
26
- ```
27
- pip install -r requirements.txt
28
- ```
29
- Clone the GitHub repo:
30
- ```
31
- git clone https://github.com/HiDream-ai/HiDream-I1
32
- ```
33
-
34
- Then you can run the inference scripts to generate images:
35
-
36
- ```python
37
- # For full model inference
38
- python ./inference.py --model_type full
39
-
40
- # For distilled dev model inference
41
- python ./inference.py --model_type dev
42
-
43
- # For distilled fast model inference
44
- python ./inference.py --model_type fast
45
- ```
46
- > **Note:** The inference script will automatically download `meta-llama/Meta-Llama-3.1-8B-Instruct` model files. If you encounter network issues, you can download these files ahead of time and place them in the appropriate cache directory to avoid download failures during inference.
47
-
48
- ## Gradio Demo
49
-
50
- We also provide a Gradio demo for interactive image generation. You can run the demo with:
51
-
52
- ```python
53
- python gradio_demo.py
54
- ```
55
-
56
- ## Evaluation Metrics
57
-
58
- ### DPG-Bench
59
- | Model | Overall | Global | Entity | Attribute | Relation | Other |
60
- |-----------------|-----------|-----------|-----------|-----------|-----------|-----------|
61
- | PixArt-alpha | 71.11 | 74.97 | 79.32 | 78.60 | 82.57 | 76.96 |
62
- | SDXL | 74.65 | 83.27 | 82.43 | 80.91 | 86.76 | 80.41 |
63
- | DALL-E 3 | 83.50 | 90.97 | 89.61 | 88.39 | 90.58 | 89.83 |
64
- | Flux.1-dev | 83.79 | 85.80 | 86.79 | 89.98 | 90.04 | 89.90 |
65
- | SD3-Medium | 84.08 | 87.90 | 91.01 | 88.83 | 80.70 | 88.68 |
66
- | Janus-Pro-7B | 84.19 | 86.90 | 88.90 | 89.40 | 89.32 | 89.48 |
67
- | CogView4-6B | 85.13 | 83.85 | 90.35 | 91.17 | 91.14 | 87.29 |
68
- | **HiDream-I1** | **85.89**| 76.44 | 90.22 | 89.48 | 93.74 | 91.83 |
69
-
70
- ### GenEval
71
-
72
- | Model | Overall | Single Obj. | Two Obj. | Counting | Colors | Position | Color attribution |
73
- |-----------------|----------|-------------|----------|----------|----------|----------|-------------------|
74
- | SDXL | 0.55 | 0.98 | 0.74 | 0.39 | 0.85 | 0.15 | 0.23 |
75
- | PixArt-alpha | 0.48 | 0.98 | 0.50 | 0.44 | 0.80 | 0.08 | 0.07 |
76
- | Flux.1-dev | 0.66 | 0.98 | 0.79 | 0.73 | 0.77 | 0.22 | 0.45 |
77
- | DALL-E 3 | 0.67 | 0.96 | 0.87 | 0.47 | 0.83 | 0.43 | 0.45 |
78
- | CogView4-6B | 0.73 | 0.99 | 0.86 | 0.66 | 0.79 | 0.48 | 0.58 |
79
- | SD3-Medium | 0.74 | 0.99 | 0.94 | 0.72 | 0.89 | 0.33 | 0.60 |
80
- | Janus-Pro-7B | 0.80 | 0.99 | 0.89 | 0.59 | 0.90 | 0.79 | 0.66 |
81
- | **HiDream-I1** | **0.83**| 1.00 | 0.98 | 0.79 | 0.91 | 0.60 | 0.72 |
82
-
83
- ### HPSv2.1 benchmark
84
-
85
- | Model | Averaged | Animation | Concept-art | Painting | Photo |
86
- |-------------------------|----------------|------------|---------------|--------------|------------|
87
- | Stable Diffusion v2.0 | 26.38 | 27.09 | 26.02 | 25.68 | 26.73 |
88
- | Midjourney V6 | 30.29 | 32.02 | 30.29 | 29.74 | 29.10 |
89
- | SDXL | 30.64 | 32.84 | 31.36 | 30.86 | 27.48 |
90
- | Dall-E3 | 31.44 | 32.39 | 31.09 | 31.18 | 31.09 |
91
- | SD3 | 31.53 | 32.60 | 31.82 | 32.06 | 29.62 |
92
- | Midjourney V5 | 32.33 | 34.05 | 32.47 | 32.24 | 30.56 |
93
- | CogView4-6B | 32.31 | 33.23 | 32.60 | 32.89 | 30.52 |
94
- | Flux.1-dev | 32.47 | 33.87 | 32.27 | 32.62 | 31.11 |
95
- | stable cascade | 32.95 | 34.58 | 33.13 | 33.29 | 30.78 |
96
- | **HiDream-I1** | **33.82** | 35.05 | 33.74 | 33.88 | 32.61 |
97
-
98
-
99
- ## License Agreement
100
- The Transformer models in this repository are licensed under the MIT License. The VAE is from `FLUX.1 [schnell]`, and the text encoders from `google/t5-v1_1-xxl` and `meta-llama/Meta-Llama-3.1-8B-Instruct`. Please follow the license terms specified for these components. You own all content you create with this model. You can use your generated content freely, but you must comply with this license agreement. You are responsible for how you use the models. Do not create illegal content, harmful material, personal information that could harm others, false information, or content targeting vulnerable groups.
101
-
102
-
103
- ## Acknowledgements
104
- - The VAE component is from `FLUX.1 [schnell]`, licensed under Apache 2.0.
105
- - The text encoders are from `google/t5-v1_1-xxl` (licensed under Apache 2.0) and `meta-llama/Meta-Llama-3.1-8B-Instruct` (licensed under the Llama 3.1 Community License Agreement).
 
 
 
 
 
 
 
 
 
 
 
1
 
2
+ # 基于YOLOv8与Deepseek的学生课堂行为识别与智能反馈系统
3
 
4
+ 这是一个使用 YOLOv8 + Streamlit 构建的原型系统,功能包括:
5
+ - 学生行为识别(基于视频)
6
+ - 注意力热力图可视化
7
+ - 基于语言模型的行为解释(模拟)
8
+ - 教学建议反馈导出
9
 
10
+ ## 使用方法
11
 
12
+ 1. 上传 classroom_behavior_system.zip 到你的 Hugging Face Space
13
+ 2. 创建新 Space,SDK 选择 `Streamlit`
14
+ 3. 上传所有文件并等待部署完成
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
app.py ADDED
@@ -0,0 +1,82 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ import streamlit as st
3
+ import cv2
4
+ import tempfile
5
+ import torch
6
+ import numpy as np
7
+ from ultralytics import YOLO
8
+ from PIL import Image
9
+ from io import BytesIO
10
+ import requests
11
+
12
+ @st.cache_resource
13
+ def load_yolo_model():
14
+ return YOLO("yolov8n.pt")
15
+
16
+ def analyze_with_deepseek(text):
17
+ prompt = f"请分析学生的以下行为并提供教学建议:{text}"
18
+ return f"分析:学生可能在积极参与小组讨论。建议教师鼓励团队合作,提升学习主动性。"
19
+
20
+ def process_video(uploaded_file, model):
21
+ tfile = tempfile.NamedTemporaryFile(delete=False)
22
+ tfile.write(uploaded_file.read())
23
+ cap = cv2.VideoCapture(tfile.name)
24
+
25
+ frames = []
26
+ heatmap = np.zeros((480, 640))
27
+ behavior_summary = []
28
+
29
+ while cap.isOpened():
30
+ ret, frame = cap.read()
31
+ if not ret:
32
+ break
33
+
34
+ frame = cv2.resize(frame, (640, 480))
35
+ results = model(frame)
36
+ boxes = results[0].boxes.xyxy.cpu().numpy() if results else []
37
+
38
+ for box in boxes:
39
+ x1, y1, x2, y2 = map(int, box[:4])
40
+ cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
41
+ cx, cy = int((x1+x2)/2), int((y1+y2)/2)
42
+ heatmap[cy, cx] += 1
43
+
44
+ frames.append(frame)
45
+ behavior_summary.append("检测到学生行为")
46
+
47
+ cap.release()
48
+ return frames, heatmap, behavior_summary
49
+
50
+ def display_heatmap(heatmap):
51
+ import matplotlib.pyplot as plt
52
+ import seaborn as sns
53
+ fig, ax = plt.subplots()
54
+ sns.heatmap(heatmap, cmap="YlOrRd", ax=ax)
55
+ st.pyplot(fig)
56
+
57
+ st.title("🎓 学生课堂行为自动识别与智能反馈系统")
58
+ st.markdown("---")
59
+
60
+ model = load_yolo_model()
61
+ uploaded_file = st.file_uploader("请上传课堂视频(mp4格式)", type=["mp4"])
62
+
63
+ if uploaded_file:
64
+ st.video(uploaded_file)
65
+ with st.spinner("正在分析视频..."):
66
+ frames, heatmap, behavior_summary = process_video(uploaded_file, model)
67
+
68
+ st.success("分析完成!")
69
+ st.subheader("📌 注意力热力图")
70
+ display_heatmap(heatmap)
71
+
72
+ st.subheader("📊 行为语义分析")
73
+ for idx, summary in enumerate(behavior_summary[:3]):
74
+ text_analysis = analyze_with_deepseek(summary)
75
+ st.info(f"帧 {idx+1}: {text_analysis}")
76
+
77
+ st.subheader("📄 教学优化建议(示例)")
78
+ st.markdown("- 增加互动提问频次")
79
+ st.markdown("- 鼓励小组讨论与合作")
80
+ st.markdown("- 适当调整教学节奏,吸引注意力")
81
+
82
+ st.download_button("📥 导出教学分析报告", data="报告内容示例...", file_name="teaching_report.txt")
requirements.txt ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ streamlit
2
+ ultralytics
3
+ opencv-python-headless
4
+ matplotlib
5
+ seaborn
6
+ Pillow