PeterSchneider commited on
Commit
d5945cb
·
verified ·
1 Parent(s): c574840

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -1
README.md CHANGED
@@ -24,7 +24,10 @@ Large Language Models, including CodeLlama-13B-QML, are not designed to be deplo
24
 
25
  ## How to run CodeLlama-13B-QML in cloud deployment:
26
 
27
- The configuration depends on your chosen cloud technology.
 
 
 
28
 
29
  ## How to run CodeLlama-13B-QML in ollama:
30
 
@@ -76,3 +79,6 @@ If there is no suffix, please use:
76
 
77
  ## Model Version:
78
  v1.0
 
 
 
 
24
 
25
  ## How to run CodeLlama-13B-QML in cloud deployment:
26
 
27
+ The configuration depends on your chosen cloud technology.
28
+
29
+ Running a CodeLlama-13b-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The deployment container should be run on an instance with GPU accelerator.
30
+ The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.
31
 
32
  ## How to run CodeLlama-13B-QML in ollama:
33
 
 
79
 
80
  ## Model Version:
81
  v1.0
82
+
83
+ ## Attribution:
84
+ CodeLlama-13B is a model of the Llama 2 family. Llama 2 is licensed under the LLAMA 2 Community License, Copyright (c) Meta Platforms, Inc. All Rights Reserved.