Update README.md
Browse files
README.md
CHANGED
@@ -30,36 +30,25 @@ By accessing this model, you are agreeing to the Llama 2 terms and conditions of
|
|
30 |
CodeLlama-7B-QML requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator.
|
31 |
|
32 |
Large Language Models, including CodeLlama-7B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.
|
|
|
|
|
33 |
|
34 |
-
|
35 |
-
|
36 |
-
## How to run CodeLlama-7B-QML in cloud deployment:
|
37 |
-
|
38 |
-
The configuration depends on the chosen cloud technology.
|
39 |
-
|
40 |
-
Running a CodeLlama-7b-QML in the cloud requires working with Docker and vLLM for optimal performance. Make sure all required dependencies are installed (transformers, accelerate and peft modules). Use bfloat16 precision. The setup leverages the base model from Hugging Face (requiring an access token) combined with adapter weights from the repository. Using vLLM enables efficient inference with an OpenAI-compatible API endpoint, making integration straightforward. vLLM serves as a highly optimized backend that implements request batching and queuing mechanisms, providing excellent serving optimization. The docker container should be run on an instance with GPU accelerator. The configuration has been thoroughly tested on Ubuntu 22.04 LTS running NVIDIA driver with A100 80GB GPUs, demonstrating stable and efficient performance.
|
41 |
-
|
42 |
-
## How to run CodeLlama-7B-QML in ollama:
|
43 |
-
|
44 |
-
The model can be downloaded either from Hugging Face or Ollama. If the choice is Hugging Face, follow all the instruction steps. In case of Ollama, execute steps 1 and 5.
|
45 |
-
|
46 |
-
#### 1. Install ollama
|
47 |
https://ollama.com/download
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
#### 4. Build the model in ollama
|
54 |
```
|
55 |
ollama create theqtcompany/codellama-7b-qml -f Modelfile
|
56 |
```
|
57 |
The model's name must be exactly as above if one wants to use the model in the Qt Creator
|
58 |
-
|
59 |
-
#### 5. Run the model
|
60 |
```
|
61 |
ollama run theqtcompany/codellama-7b-qml
|
62 |
```
|
|
|
63 |
You can start writing prompts in the terminal or send curl requests now.
|
64 |
|
65 |
Here is a curl request example:
|
|
|
30 |
CodeLlama-7B-QML requires significant computing resources to perform with inference (response) times suitable for automatic code completion. Therefore, it should be used with a GPU accelerator.
|
31 |
|
32 |
Large Language Models, including CodeLlama-7B-QML, are not designed to be deployed in isolation but instead should be deployed as part of an overall AI system with additional safety guardrails as required. Developers are expected to deploy system safeguards when building AI systems.
|
33 |
+
|
34 |
+
## How to run CodeLlama-7B-QML:
|
35 |
|
36 |
+
1. Install ollama with the following command:
|
37 |
+
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
38 |
https://ollama.com/download
|
39 |
+
```
|
40 |
+
2. Clone this model repository. Click three dots in the top right corner and choose 'Clone repository'.
|
41 |
+
3. Open the terminal and go to the just cloned repository.
|
42 |
+
4. Build the model in ollama by executing the given command in the terminal:
|
|
|
|
|
43 |
```
|
44 |
ollama create theqtcompany/codellama-7b-qml -f Modelfile
|
45 |
```
|
46 |
The model's name must be exactly as above if one wants to use the model in the Qt Creator
|
47 |
+
5. Run the model by entering the provided command int he terminal:
|
|
|
48 |
```
|
49 |
ollama run theqtcompany/codellama-7b-qml
|
50 |
```
|
51 |
+
|
52 |
You can start writing prompts in the terminal or send curl requests now.
|
53 |
|
54 |
Here is a curl request example:
|