Text Generation
Transformers
Safetensors
llama
text-generation-inference
danielsteinigen commited on
Commit
455b505
·
verified ·
1 Parent(s): 2fcfcda

add sample for usage with vLLM to Readme

Browse files
Files changed (1) hide show
  1. README.md +40 -0
README.md CHANGED
@@ -125,6 +125,46 @@ print(prediction_text)
125
 
126
  This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
127
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
128
  ## Training Details
129
 
130
  ### Pre-Training Data
 
125
 
126
  This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
127
 
128
+ ### Usage with vLLM Server
129
+ Starting the vLLM Server:
130
+ ``` shell
131
+ vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code
132
+ ```
133
+ Use Chat API with vLLM and pass the language of the Chat-Template as extra body:
134
+ ``` python
135
+ from openai import OpenAI
136
+
137
+ client = OpenAI(
138
+ api_key="EMPTY",
139
+ base_url="http://localhost:8000/v1",
140
+ )
141
+ completion = client.chat.completions.create(
142
+ model="openGPT-X/Teuken-7B-instruct-commercial-v0.4",
143
+ messages=[{"role": "User", "content": "Hallo"}],
144
+ extra_body={"chat_template":"DE"}
145
+ )
146
+ print(f"Assistant: {completion]")
147
+ ```
148
+ The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name `lang` and the content `DE` and start the vLLM Server as follows:
149
+ ``` shell
150
+ vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code --chat-template lang
151
+ ```
152
+
153
+ ### Usage with vLLM Offline Batched Inference
154
+ ``` python
155
+ from vllm import LLM, SamplingParams
156
+
157
+ sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=["</s>"])
158
+ llm = LLM(model="openGPT-X/Teuken-7B-instruct-commercial-v0.4", trust_remote_code=True, dtype="bfloat16")
159
+ outputs = llm.chat(
160
+ messages=[{"role": "User", "content": "Hallo"}],
161
+ sampling_params=sampling_params,
162
+ chat_template="DE"
163
+ )
164
+ print(f"Prompt: {outputs[0].prompt}")
165
+ print(f"Assistant: {outputs[0].outputs[0].text}")
166
+ ```
167
+
168
  ## Training Details
169
 
170
  ### Pre-Training Data