kishizaki-sci
/

Llama-4-Scout-17B-16E-Instruct-AWQ

Safetensors

English

llama4

4-bit precision

awq

Model card Files Files and versions Community

kishizaki-sci commited on 10 days ago

Commit

6263f7d

verified ·

1 Parent(s): ddba123

Upload llama4_inference.ipynb

Browse files

Files changed (1) hide show

llama4_inference.ipynb +385 -22

llama4_inference.ipynb CHANGED Viewed

@@ -52,7 +52,7 @@
    "metadata": {},
    "outputs": [],
    "source": [
-    "quant_path = '/workspace/hf_cache/Llama-4-Scout-17B-16E-Instruct-AWQ'"
    ]
   },
   {
@@ -61,24 +61,65 @@
    "id": "c13b72e4-f6cd-4642-a110-040844127541",
    "metadata": {},
    "outputs": [
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/workspace/llama4-awq/AutoAWQ/awq/models/llama4.py:313: UserWarning: Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\n",
-      "  warnings.warn(\"Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\", UserWarning)\n",
-      "You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "15b41a3c3e154516b93b4f2b90e976fb",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Replacing MoE Block...:   0%|          | 0/48 [00:00<?, ?it/s]"
       ]
      },
      "metadata": {},
@@ -87,12 +128,306 @@
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
-       "model_id": "38fe24c9ae5549a7a7c674292b0e4f95",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
-       "Replacing layers...:   0%|          | 0/48 [00:00<?, ?it/s]"
       ]
      },
      "metadata": {},
@@ -102,7 +437,29 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/workspace/llama4-awq/AutoAWQ/awq/models/base.py:540: UserWarning: Skipping fusing modules because AWQ extension is not installed.No module named 'awq_ext'\n",
       "  warnings.warn(\"Skipping fusing modules because AWQ extension is not installed.\" + msg)\n"
      ]
     },
@@ -110,8 +467,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 1h 36min 24s, sys: 24min 59s, total: 2h 1min 23s\n",
-      "Wall time: 30min 59s\n"
      ]
     }
    ],
@@ -131,7 +488,7 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "Sat Apr 19 17:03:15 2025       \n",
       "+-----------------------------------------------------------------------------------------+\n",
       "| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |\n",
       "|-----------------------------------------+------------------------+----------------------+\n",
@@ -139,8 +496,8 @@
       "| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n",
       "|                                         |                        |               MIG M. |\n",
       "|=========================================+========================+======================|\n",
-      "|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:B7:00.0 Off |                    0 |\n",
-      "| N/A   25C    P0             75W /  400W |   60415MiB /  81920MiB |      0%      Default |\n",
       "|                                         |                        |             Disabled |\n",
       "+-----------------------------------------+------------------------+----------------------+\n",
       "                                                                                         \n",
@@ -200,8 +557,8 @@
      "name": "stdout",
      "output_type": "stream",
      "text": [
-      "CPU times: user 5min 36s, sys: 5min 9s, total: 10min 45s\n",
-      "Wall time: 10min 45s\n"
      ]
     }
    ],
@@ -237,15 +594,21 @@
       "\n",
       "where $\\Gamma^i_{jk}$ are the Christoffel symbols of the second kind, which define the connection.\n",
       "\n",
-      "In general relativity, the Levi-Civita connection is a fundamental concept, and it is assumed to be torsion-free. This connection is used to define the covariant derivative of tensors, which is essential for describing the curvature of spacetime.\n",
       "\n",
-      "The assumption of a torsion-free connection has important implications:\n",
       "\n",
-      "1. **Geodesic equation**: The geodesic equation, which describes the shortest path in curved spacetime, is derived from the Levi-Civita connection. A torsion-free connection ensures that geodesics are symmetric, meaning that they have no \"twist\" or \"turn\".\n",
-      "2. **Riemannian geometry**: The Levi-Civita connection is a fundamental ingredient in Riemannian geometry, which is the mathematical framework for describing curved spacetime in general relativity.\n",
-      "3. **Einstein's field equations**: The Einstein field equations, which relate the curvature of spacetime to the distribution of mass and energy, rely on the Levi-Civita connection.\n",
       "\n",
-      "In summary, a torsion-free connection in general relativity means that the connection used to describe the curvature of spacetime has zero torsion, which is a fundamental assumption in Riemannian geometry and leads to the Levi-Civita connection. This assumption is crucial for the mathematical formulation of general relativity, including the geodesic equation and Einstein's field equations.<|eot|>\n"
      ]
     }
    ],

    "metadata": {},
    "outputs": [],
    "source": [
+    "quant_path = 'kishizaki-sci/Llama-4-Scout-17B-16E-Instruct-AWQ'"
    ]
   },
   {
    "id": "c13b72e4-f6cd-4642-a110-040844127541",
    "metadata": {},
    "outputs": [
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "4f6c7b795fb24940b752dded2f51dea0",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "config.json:   0%|          | 0.00/3.55k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
     {
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "/workspace/AutoAWQ/awq/models/llama4.py:312: UserWarning: Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\n",
+      "  warnings.warn(\"Multimodal input has not been implemented in Llama4AWQForConditionalGeneration yet.\", UserWarning)\n"
      ]
     },
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
+       "model_id": "02743a30540244bbb4c05585becc4cc8",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
+       "Fetching 25 files:   0%|          | 0/25 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "3200dd87c1df4ad2a0d1ce0a300f094c",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "llama4_inference.ipynb:   0%|          | 0.00/10.5k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "b132c3a11c024af78dd8fd4d30b579cc",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "generation_config.json:   0%|          | 0.00/239 [00:00<?, ?B/s]"
       ]
      },
      "metadata": {},
     {
      "data": {
       "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1f505cb27bc04c92b48ca9e27202cb4b",
        "version_major": 2,
        "version_minor": 0
       },
       "text/plain": [
+       ".gitattributes:   0%|          | 0.00/1.57k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "43b1b86a0c304e56b1870d08598b374f",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00002-of-00013.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "7babbb9a6b154f509c88a56b4fee3d0e",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "README.md:   0%|          | 0.00/138 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "405cd2691a844317b6570eb80c622d24",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00001-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e979f0b6efbe40baa318dccb57bffe24",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00004-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "efccd71f8aa2492bad03ebadbbef596d",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00005-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a3028a64430744beb181ace86753bee6",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "chat_template.json:   0%|          | 0.00/5.18k [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "448ff54d8a6642e685474d3bc49de09b",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00003-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "63cb05c948ef476c98df712735133fa7",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00006-of-00013.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "1afb22e38b7c46c0ab496bd67c949402",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00007-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "ded5ae0f710a4124b00cb0b061383962",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00008-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "0db9d375464c415ebb33fc0b268a9950",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00009-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "e49ff7276a7b493cb5ab7dc222e9d3ec",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00010-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "cbb0557242c546708bfb5ba4c7789499",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00011-of-00013.safetensors:   0%|          | 0.00/4.98G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "2c603da8fcc34e0584963099695e7142",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00012-of-00013.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "645fee17e47d44e1b7fb51865a00fc73",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model-00013-of-00013.safetensors:   0%|          | 0.00/2.79G [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "a022c5cb1f174ac5bd9fad91cda9352a",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "model.safetensors.index.json:   0%|          | 0.00/1.13M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "f68405f7857d4db1b40b264f25175f43",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "preprocessor_config.json:   0%|          | 0.00/636 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "81bc3a84390f4c388f1012044d4ce6ec",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "processor_config.json:   0%|          | 0.00/128 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "842746981e6e478aae858d3daf844460",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "special_tokens_map.json:   0%|          | 0.00/448 [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "067b4e51f41840cdb7305972f1aa0dc6",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer.json:   0%|          | 0.00/27.9M [00:00<?, ?B/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "5ceb2f619f9f4513bc307c5846fae2e5",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "tokenizer_config.json:   0%|          | 0.00/237k [00:00<?, ?B/s]"
       ]
      },
      "metadata": {},
      "name": "stderr",
      "output_type": "stream",
      "text": [
+      "You have set `use_cache` to `False`, but cache_implementation is set to hybrid. cache_implementation will have no effect.\n"
+     ]
+    },
+    {
+     "data": {
+      "application/vnd.jupyter.widget-view+json": {
+       "model_id": "db4fddb7315c464ea87c8ad507f9a651",
+       "version_major": 2,
+       "version_minor": 0
+      },
+      "text/plain": [
+       "Replacing MoE Block...:   0%|          | 0/48 [00:00<?, ?it/s]"
+      ]
+     },
+     "metadata": {},
+     "output_type": "display_data"
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "Replacing layers...: 100%|██████████| 48/48 [00:07<00:00,  6.18it/s]\n",
+      "/workspace/AutoAWQ/awq/models/base.py:539: UserWarning: Skipping fusing modules because AWQ extension is not installed.No module named 'awq_ext'\n",
       "  warnings.warn(\"Skipping fusing modules because AWQ extension is not installed.\" + msg)\n"
      ]
     },
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "CPU times: user 1h 44min 26s, sys: 26min 26s, total: 2h 10min 53s\n",
+      "Wall time: 28min 55s\n"
      ]
     }
    ],
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "Sat Apr 19 22:53:42 2025       \n",
       "+-----------------------------------------------------------------------------------------+\n",
       "| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |\n",
       "|-----------------------------------------+------------------------+----------------------+\n",
       "| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |\n",
       "|                                         |                        |               MIG M. |\n",
       "|=========================================+========================+======================|\n",
+      "|   0  NVIDIA A100-SXM4-80GB          On  |   00000000:87:00.0 Off |                    0 |\n",
+      "| N/A   30C    P0             71W /  400W |   60415MiB /  81920MiB |      0%      Default |\n",
       "|                                         |                        |             Disabled |\n",
       "+-----------------------------------------+------------------------+----------------------+\n",
       "                                                                                         \n",
      "name": "stdout",
      "output_type": "stream",
      "text": [
+      "CPU times: user 4min 28s, sys: 3min 39s, total: 8min 7s\n",
+      "Wall time: 8min 8s\n"
      ]
     }
    ],
       "\n",
       "where $\\Gamma^i_{jk}$ are the Christoffel symbols of the second kind, which define the connection.\n",
       "\n",
+      "In general relativity, the Levi-Civita connection is a fundamental example of a torsion-free connection. It's the unique connection that is:\n",
+      "\n",
+      "1. **Metric-compatible**: it preserves the metric tensor under parallel transport.\n",
+      "2. **Torsion-free**: it has zero torsion.\n",
+      "\n",
+      "The Levi-Civita connection is used to define the covariant derivative, which is essential for describing the curvature of spacetime.\n",
+      "\n",
+      "The assumption of a torsion-free connection is crucial in general relativity, as it allows us to:\n",
       "\n",
+      "1. **Define a unique covariant derivative**: which is necessary for formulating the Einstein field equations.\n",
+      "2. **Ensure geodesic equation**: which describes the shortest path in curved spacetime, is well-defined.\n",
       "\n",
+      "However, it's worth noting that there are some alternative theories, such as Einstein-Cartan theory, which consider torsion as a fundamental aspect of spacetime geometry.\n",
       "\n",
+      "I hope this explanation helps! Do you have any follow-up questions?<|eot|>\n"
      ]
     }
    ],