kishizaki-sci
/

Llama-4-Scout-17B-16E-Instruct-AWQ

+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "3008bf1f-df10-484b-9662-96cda681b93b",
+   "metadata": {},
+   "source": [
+    "```bash\n",
+    "git clone https://github.com/kIshizaki-sci/AutoAWQ.git\n",
+    "pip install -U transformers\n",
+    "pip install -e ./AutoAWQ\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "id": "a388d190-a611-46b0-aa8c-dcf5c97b0c1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from awq import AutoAWQForCausalLM\n",
+    "import torch\n",
+    "import transformers\n",
+    "from transformers import AutoProcessor"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "de7f888d-fe64-47f1-a106-75a7b911354f",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch version :  2.4.1+cu124\n",
+      "transformers version :  4.51.3\n"
+     ]
+    }
+   ],
+   "source": [
+    "print('torch version : ', torch.__version__)\n",
+    "print('transformers version : ', transformers.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "61623ebd-d8a4-45b6-aff3-3c16a78ecfa5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "quant_path = '/workspace/hf_cache/Llama-4-Scout-17B-16E-Instruct-AWQ'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "id": "c13b72e4-f6cd-4642-a110-040844127541",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/workspace/llama4-awq/AutoAWQ/awq/models/llama4.py:313: UserWarning: Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\n",
+      "  warnings.warn(\"Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\", UserWarning)\n",
+      "You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "15b41a3c3e154516b93b4f2b90e976fb",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Replacing MoE Block...:   0%|          | 0/48 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "38fe24c9ae5549a7a7c674292b0e4f95",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Replacing layers...:   0%|          | 0/48 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/workspace/llama4-awq/AutoAWQ/awq/models/base.py:540: UserWarning: Skipping fusing modules because AWQ extension is not installed.No module named 'awq_ext'\n",
+      "  warnings.warn(\"Skipping fusing modules because AWQ extension is not installed.\" + msg)\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 1h 36min 24s, sys: 24min 59s, total: 2h 1min 23s\n",
+      "Wall time: 30min 59s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "model = AutoAWQForCausalLM.from_quantized(quant_path, torch_dtype=torch.float16, use_cache=True, device_map='auto')\n",
+    "processor = AutoProcessor.from_pretrained(quant_path)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "665ae24c-cb73-489d-bb03-35d760460070",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sat Apr 19 17:03:15 2025       \n",
+      "+-----------------------------------------------------------------------------------------+\n",
+      "| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |\n",
+      "|-----------------------------------------+------------------------+----------------------+\n",
+      "| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |\n",
+      "| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n",
+      "|                                         |                        |               MIG M. |\n",
+      "|=========================================+========================+======================|\n",
+      "|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:B7:00.0 Off |                    0 |\n",
+      "| N/A   25C    P0             75W /  400W |   60415MiB /  81920MiB |      0%      Default |\n",
+      "|                                         |                        |             Disabled |\n",
+      "+-----------------------------------------+------------------------+----------------------+\n",
+      "                                                                                         \n",
+      "+-----------------------------------------------------------------------------------------+\n",
+      "| Processes:                                                                              |\n",
+      "|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |\n",
+      "|        ID   ID                                                               Usage      |\n",
+      "|=========================================================================================|\n",
+      "+-----------------------------------------------------------------------------------------+\n"
+     ]
+    }
+   ],
+   "source": [
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "id": "39c8154a-08c4-485e-b136-955d1b4fbec9",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "messages = [\n",
+    "    {\n",
+    "        \"role\": \"user\",\n",
+    "        \"content\": [\n",
+    "            {\"type\": \"text\", \"text\": \"What does means the torsion free in the general relativit?\"},\n",
+    "        ]\n",
+    "    },\n",
+    "]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "id": "44c4ae97-96cf-47f7-a8e4-16d2c34bdc1b",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "inputs = processor.apply_chat_template(\n",
+    "    messages,\n",
+    "    add_generation_prompt=True,\n",
+    "    tokenize=True,\n",
+    "    return_dict=True,\n",
+    "    return_tensors=\"pt\",\n",
+    ").to(model.model.device)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "c6dddd7b-5382-45e8-8dd1-6ca719200f64",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "CPU times: user 5min 36s, sys: 5min 9s, total: 10min 45s\n",
+      "Wall time: 10min 45s\n"
+     ]
+    }
+   ],
+   "source": [
+    "%%time\n",
+    "outputs = model.generate(\n",
+    "    **inputs,\n",
+    "    max_new_tokens=2048,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "id": "ab1d4fe0-e315-40ee-b3e0-10db1e6b2023",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "A question from the realm of differential geometry and general relativity!\n",
+      "\n",
+      "In general relativity, \"torsion-free\" refers to a property of a connection on a manifold, specifically in the context of Riemannian geometry.\n",
+      "\n",
+      "**Torsion** is a measure of how much a connection \"twists\" or \"turns\" a vector as it is parallel-transported around a closed loop. In other words, it's a measure of how much the connection deviates from being \"flat\" or \"Euclidean\".\n",
+      "\n",
+      "A **torsion-free connection**, also known as a **symmetric connection**, is a connection that has zero torsion. This means that when you parallel-transport a vector around a closed loop, it returns to its original orientation, without any twisting or turning.\n",
+      "\n",
+      "In mathematical terms, a torsion-free connection satisfies the following condition:\n",
+      "\n",
+      "$$\\Gamma^i_{jk} = \\Gamma^i_{kj}$$\n",
+      "\n",
+      "where $\\Gamma^i_{jk}$ are the Christoffel symbols of the second kind, which define the connection.\n",
+      "\n",
+      "In general relativity, the Levi-Civita connection is a fundamental concept, and it is assumed to be torsion-free. This connection is used to define the covariant derivative of tensors, which is essential for describing the curvature of spacetime.\n",
+      "\n",
+      "The assumption of a torsion-free connection has important implications:\n",
+      "\n",
+      "1. **Geodesic equation**: The geodesic equation, which describes the shortest path in curved spacetime, is derived from the Levi-Civita connection. A torsion-free connection ensures that geodesics are symmetric, meaning that they have no \"twist\" or \"turn\".\n",
+      "2. **Riemannian geometry**: The Levi-Civita connection is a fundamental ingredient in Riemannian geometry, which is the mathematical framework for describing curved spacetime in general relativity.\n",
+      "3. **Einstein's field equations**: The Einstein field equations, which relate the curvature of spacetime to the distribution of mass and energy, rely on the Levi-Civita connection.\n",
+      "\n",
+      "In summary, a torsion-free connection in general relativity means that the connection used to describe the curvature of spacetime has zero torsion, which is a fundamental assumption in Riemannian geometry and leads to the Levi-Civita connection. This assumption is crucial for the mathematical formulation of general relativity, including the geodesic equation and Einstein's field equations.<|eot|>\n"
+     ]
+    }
+   ],
+   "source": [
+    "response = processor.batch_decode(outputs[:, inputs[\"input_ids\"].shape[-1]:])[0]\n",
+    "print(response)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "5f3f1be7-6960-474e-b19d-afd2efa56174",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.11.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}