alpindale commited on
Commit
7574cfc
·
verified ·
1 Parent(s): 7b75764

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +81 -0
README.md ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - PygmalionAI/Eleusis-12B
7
+ pipeline_tag: text-generation
8
+ ---
9
+
10
+ This is an ONNX optimized version of [Eleusis-12B](https://huggingface.co/PygmalionAI/Eleusis-12B).
11
+ For a more comprehensive info about the model's capabilities, please visit the original model's repo.
12
+
13
+ ## Inference
14
+ ### Requirements
15
+ If you're on a CPU-only machine:
16
+
17
+ ```sh
18
+ pip install onnxruntime
19
+ ```
20
+
21
+ If you have an NVIDIA GPU available:
22
+ ```sh
23
+ pip uninstall onnxruntime -y
24
+ pip install onnxruntime-gpu
25
+ ```
26
+
27
+ Make sure you have installed [CUDA Toolkit](https://developer.nvidia.com/cuda-12-4-0-download-archive) and [cuDNN](https://developer.nvidia.com/cudnn)
28
+
29
+ ```sh
30
+ import onnxruntime as ort
31
+ from transformers import AutoTokenizer
32
+ import numpy as np
33
+ import argparse
34
+
35
+ def generate_text(prompt, num_tokens, model_path, tokenizer_path):
36
+ tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
37
+ providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
38
+ session = ort.InferenceSession(model_path, providers=providers)
39
+
40
+ input_ids = tokenizer(prompt, return_tensors="np").input_ids
41
+
42
+ for _ in range(num_tokens):
43
+ # Create attention mask and position ids
44
+ attention_mask = np.ones_like(input_ids)
45
+ position_ids = np.arange(input_ids.shape[1])[None, :]
46
+
47
+ outputs = session.run(
48
+ output_names=['logits'],
49
+ input_feed={
50
+ 'input_ids': input_ids,
51
+ 'attention_mask': attention_mask,
52
+ 'position_ids': position_ids
53
+ }
54
+ )
55
+
56
+ next_token = np.argmax(outputs[0][0, -1, :])
57
+
58
+ input_ids = np.concatenate([input_ids, [[next_token]]], axis=1)
59
+
60
+ return tokenizer.decode(input_ids[0], skip_special_tokens=True)
61
+
62
+ if __name__ == "__main__":
63
+ parser = argparse.ArgumentParser(description='Generate text using ONNX model')
64
+ parser.add_argument('prompt', type=str, help='Input prompt for generation')
65
+ parser.add_argument('num_tokens', type=int, help='Number of tokens to generate')
66
+ parser.add_argument('--model_path', type=str, default='model.onnx',
67
+ help='Path to ONNX model file')
68
+ parser.add_argument('--tokenizer_path', type=str, default='tokenizer',
69
+ help='Path to tokenizer directory')
70
+
71
+ args = parser.parse_args()
72
+
73
+ result = generate_text(args.prompt, args.num_tokens, args.model_path, args.tokenizer_path)
74
+ print(result)
75
+ ```
76
+
77
+ ```sh
78
+ python onnx_inference.py "Once upon a time" 512 --model_path /path/to/model.onnx --tokenizer_path /path/to/model/dir
79
+ ```
80
+
81
+ This is an example script, and not properly optimized.