slupart commited on
Commit
bc9953b
·
verified ·
1 Parent(s): 8c3b8da

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +34 -0
README.md CHANGED
@@ -27,6 +27,40 @@ Training is performed via **distillation from multiple teachers: human and [Mist
27
  The input format is a flattened version of the conversational history.
28
  q_n [SEP] a_{n-1} [SEP] q_{n-1} [SEP] ... [SEP] a_0 [SEP] q_0
29
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
30
 
31
  ## Citation
32
  If you use our checkpoint, please cite our work:
 
27
  The input format is a flattened version of the conversational history.
28
  q_n [SEP] a_{n-1} [SEP] q_{n-1} [SEP] ... [SEP] a_0 [SEP] q_0
29
 
30
+ Below is an example script for encoding a conversation:
31
+
32
+ ```python
33
+ from transformers import AutoTokenizer, AutoModelForMaskedLM
34
+ import torch.nn.functional as F
35
+ import torch
36
+
37
+
38
+ model = AutoModelForMaskedLM.from_pretrained("slupart/splade-disco-human-mistral")
39
+ tokenizer = AutoTokenizer.from_pretrained("slupart/splade-disco-human-mistral")
40
+ model.eval()
41
+
42
+ conv = [
43
+ ("what's the weather like today?", "it's sunny."),
44
+ ("should I wear sunscreen?", "yes, UV index is high."),
45
+ ("do I need sunglasses?", "definitely."),
46
+ ("where can I buy sunglasses?", "try the optician nearby."),
47
+ ("how much do they cost?", None)
48
+ ]
49
+
50
+ parts = [conv[-1][0]] + [x for q, a in reversed(conv[:-1]) for x in (a, q) if x]
51
+ text = " [SEP] ".join(parts)
52
+
53
+ inputs = tokenizer(text, return_tensors="pt")
54
+ with torch.no_grad():
55
+ logits = model(**inputs).logits
56
+ sparse = F.relu(logits).max(1).values.squeeze(0)
57
+
58
+ scores = [(tokenizer.convert_ids_to_tokens([i.item()])[0], sparse[i].item())
59
+ for i in torch.nonzero(sparse).squeeze(1)]
60
+ for token, score in sorted(scores, key=lambda x: -x[1]):
61
+ print(f"Token: {token:15} | Score: {score:.4f}")
62
+ ```
63
+
64
 
65
  ## Citation
66
  If you use our checkpoint, please cite our work: