HiDream-ai
/

HiDream-I1-Full

@@ -1,105 +1,14 @@
----
-license: mit
-tags:
-- image-generation
-- HiDream.ai
-language:
-- en
-pipeline_tag: text-to-image
-library_name: diffusers
----
-![HiDream-I1 Demo](demo.jpg)
-`HiDream-I1` is a new open-source image generative foundation model with 17B parameters that achieves state-of-the-art image generation quality within seconds.
-<span style="color: #FF5733; font-weight: bold">For more features and to experience the full capabilities of our product, please visit [https://vivago.ai/](https://vivago.ai/).</span>
-## Key Features
-- ✨ **Superior Image Quality** - Produces exceptional results across multiple styles including photorealistic, cartoon, artistic, and more. Achieves state-of-the-art HPS v2.1 score, which aligns with human preferences.
-- 🎯 **Best-in-Class Prompt Following** - Achieves industry-leading scores on GenEval and DPG benchmarks, outperforming all other open-source models.
-- 🔓 **Open Source** - Released under the MIT license to foster scientific advancement and enable creative innovation.
-- 💼 **Commercial-Friendly** - Generated images can be freely used for personal projects, scientific research, and commercial applications.
-## Quick Start
-Please make sure you have installed [Flash Attention](https://github.com/Dao-AILab/flash-attention). We recommend CUDA version 12.4 for the manual installation.
-```
-pip install -r requirements.txt
-```
-Clone the GitHub repo:
-```
-git clone https://github.com/HiDream-ai/HiDream-I1
-```
-Then you can run the inference scripts to generate images:
-```python
-# For full model inference
-python ./inference.py --model_type full
-# For distilled dev model inference
-python ./inference.py --model_type dev
-# For distilled fast model inference
-python ./inference.py --model_type fast
-```
-> **Note:** The inference script will automatically download `meta-llama/Meta-Llama-3.1-8B-Instruct` model files. If you encounter network issues, you can download these files ahead of time and place them in the appropriate cache directory to avoid download failures during inference.
-## Gradio Demo
-We also provide a Gradio demo for interactive image generation. You can run the demo with:
-```python
-python gradio_demo.py
-```
-## Evaluation Metrics
-### DPG-Bench
-| Model           | Overall   | Global    | Entity    | Attribute | Relation  | Other     |
-|-----------------|-----------|-----------|-----------|-----------|-----------|-----------|
-| PixArt-alpha    |    71.11  | 74.97     | 79.32     | 78.60     | 82.57     | 76.96     |
-| SDXL            |    74.65  | 83.27     | 82.43     | 80.91     | 86.76     | 80.41     |
-| DALL-E 3        |    83.50  | 90.97     | 89.61     | 88.39     | 90.58     | 89.83     |
-| Flux.1-dev      |    83.79  | 85.80     | 86.79     | 89.98     | 90.04     | 89.90     |
-| SD3-Medium      |    84.08  | 87.90     | 91.01     | 88.83     | 80.70     | 88.68     |
-| Janus-Pro-7B    |    84.19  | 86.90     | 88.90     | 89.40     | 89.32     | 89.48     |
-| CogView4-6B     |    85.13  | 83.85     | 90.35     | 91.17     | 91.14     | 87.29     |
-| **HiDream-I1**  |  **85.89**| 76.44 	  | 90.22     | 89.48     | 93.74     | 91.83     |
-### GenEval
-| Model           | Overall  | Single Obj. | Two Obj. | Counting | Colors   | Position | Color attribution |
-|-----------------|----------|-------------|----------|----------|----------|----------|-------------------|
-| SDXL            |    0.55  | 0.98        | 0.74     | 0.39     | 0.85     | 0.15     | 0.23              |
-| PixArt-alpha    |    0.48  | 0.98        | 0.50     | 0.44     | 0.80     | 0.08     | 0.07              |
-| Flux.1-dev      |    0.66  | 0.98        | 0.79     | 0.73     | 0.77     | 0.22     | 0.45              |
-| DALL-E 3        |    0.67  | 0.96        | 0.87     | 0.47     | 0.83     | 0.43     | 0.45              |
-| CogView4-6B     |    0.73  | 0.99        | 0.86     | 0.66     | 0.79     | 0.48     | 0.58              |
-| SD3-Medium      |    0.74  | 0.99        | 0.94     | 0.72     | 0.89     | 0.33     | 0.60              |
-| Janus-Pro-7B    |    0.80  | 0.99        | 0.89     | 0.59     | 0.90     | 0.79     | 0.66              |
-| **HiDream-I1**  |  **0.83**| 1.00        | 0.98 	  | 0.79 	 | 0.91 	| 0.60 	   | 0.72              |
-### HPSv2.1 benchmark
-|  Model                  |     Averaged   | Animation  |  Concept-art  |   Painting   |   Photo    |
-|-------------------------|----------------|------------|---------------|--------------|------------|
-|  Stable Diffusion v2.0  |       26.38    |	27.09   |      26.02    |    25.68     |    26.73   |
-|  Midjourney V6          |       30.29    |    32.02   |      30.29    |    29.74     |    29.10   |
-|  SDXL	                  |       30.64    |    32.84   |      31.36    |    30.86     |    27.48   |
-|  Dall-E3	              |       31.44    |    32.39   |      31.09    |    31.18     |    31.09   |
-|  SD3                    |       31.53    |    32.60   |      31.82    |    32.06     |    29.62   |
-|  Midjourney V5          |       32.33    |    34.05   |      32.47    |    32.24     |    30.56   |
-|  CogView4-6B            |       32.31    |    33.23   |      32.60    |    32.89     |    30.52   |
-|  Flux.1-dev             |       32.47    |    33.87   |      32.27    |    32.62     |    31.11   |
-|  stable cascade         |       32.95    |    34.58   |      33.13    |    33.29     |    30.78   |
-|  **HiDream-I1**         |     **33.82**  |    35.05   |      33.74    |    33.88     |    32.61   |
-## License Agreement
-The Transformer models in this repository are licensed under the MIT License. The VAE is from `FLUX.1 [schnell]`, and the text encoders from `google/t5-v1_1-xxl` and `meta-llama/Meta-Llama-3.1-8B-Instruct`. Please follow the license terms specified for these components. You own all content you create with this model. You can use your generated content freely, but you must comply with this license agreement. You are responsible for how you use the models. Do not create illegal content, harmful material, personal information that could harm others, false information, or content targeting vulnerable groups.
-## Acknowledgements
-- The VAE component is from `FLUX.1 [schnell]`, licensed under Apache 2.0.
-- The text encoders are from `google/t5-v1_1-xxl` (licensed under Apache 2.0) and `meta-llama/Meta-Llama-3.1-8B-Instruct` (licensed under the Llama 3.1 Community License Agreement).

+# 基于YOLOv8与Deepseek的学生课堂行为识别与智能反馈系统
+这是一个使用 YOLOv8 + Streamlit 构建的原型系统，功能包括：
+- 学生行为识别（基于视频）
+- 注意力热力图可视化
+- 基于语言模型的行为解释（模拟）
+- 教学建议反馈导出
+## 使用方法
+1. 上传 classroom_behavior_system.zip 到你的 Hugging Face Space
+2. 创建新 Space，SDK 选择 `Streamlit`
+3. 上传所有文件并等待部署完成

app.py ADDED Viewed

	@@ -0,0 +1,82 @@

+import streamlit as st
+import cv2
+import tempfile
+import torch
+import numpy as np
+from ultralytics import YOLO
+from PIL import Image
+from io import BytesIO
+import requests
+@st.cache_resource
+def load_yolo_model():
+    return YOLO("yolov8n.pt")
+def analyze_with_deepseek(text):
+    prompt = f"请分析学生的以下行为并提供教学建议：{text}"
+    return f"分析：学生可能在积极参与小组讨论。建议教师鼓励团队合作，提升学习主动性。"
+def process_video(uploaded_file, model):
+    tfile = tempfile.NamedTemporaryFile(delete=False)
+    tfile.write(uploaded_file.read())
+    cap = cv2.VideoCapture(tfile.name)
+    frames = []
+    heatmap = np.zeros((480, 640))
+    behavior_summary = []
+    while cap.isOpened():
+        ret, frame = cap.read()
+        if not ret:
+            break
+        frame = cv2.resize(frame, (640, 480))
+        results = model(frame)
+        boxes = results[0].boxes.xyxy.cpu().numpy() if results else []
+        for box in boxes:
+            x1, y1, x2, y2 = map(int, box[:4])
+            cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
+            cx, cy = int((x1+x2)/2), int((y1+y2)/2)
+            heatmap[cy, cx] += 1
+        frames.append(frame)
+        behavior_summary.append("检测到学生行为")
+    cap.release()
+    return frames, heatmap, behavior_summary
+def display_heatmap(heatmap):
+    import matplotlib.pyplot as plt
+    import seaborn as sns
+    fig, ax = plt.subplots()
+    sns.heatmap(heatmap, cmap="YlOrRd", ax=ax)
+    st.pyplot(fig)
+st.title("🎓 学生课堂行为自动识别与智能反馈系统")
+st.markdown("---")
+model = load_yolo_model()
+uploaded_file = st.file_uploader("请上传课堂视频（mp4格式）", type=["mp4"])
+if uploaded_file:
+    st.video(uploaded_file)
+    with st.spinner("正在分析视频..."):
+        frames, heatmap, behavior_summary = process_video(uploaded_file, model)
+    st.success("分析完成！")
+    st.subheader("📌 注意力热力图")
+    display_heatmap(heatmap)
+    st.subheader("📊 行为语义分析")
+    for idx, summary in enumerate(behavior_summary[:3]):
+        text_analysis = analyze_with_deepseek(summary)
+        st.info(f"帧 {idx+1}: {text_analysis}")
+    st.subheader("📄 教学优化建议（示例）")
+    st.markdown("- 增加互动提问频次")
+    st.markdown("- 鼓励小组讨论与合作")
+    st.markdown("- 适当调整教学节奏，吸引注意力")
+    st.download_button("📥 导出教学分析报告", data="报告内容示例...", file_name="teaching_report.txt")

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+streamlit
+ultralytics
+opencv-python-headless
+matplotlib
+seaborn
+Pillow