slupart
/

splade-disco-human-mistral

conversational-search

multi-turn retrieval

query-expansion

document-expansion

passage-retrieval

knowledge-distillation

Model card Files Files and versions Community

slupart commited on 22 days ago

Commit

bc9953b

·

verified ·

1 Parent(s): 8c3b8da

Update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -27,6 +27,40 @@ Training is performed via **distillation from multiple teachers: human and [Mist
 The input format is a flattened version of the conversational history.
 q_n [SEP] a_{n-1} [SEP] q_{n-1} [SEP] ... [SEP] a_0 [SEP] q_0
 ## Citation
 If you use our checkpoint, please cite our work:

 The input format is a flattened version of the conversational history.
 q_n [SEP] a_{n-1} [SEP] q_{n-1} [SEP] ... [SEP] a_0 [SEP] q_0
+Below is an example script for encoding a conversation:
+```python
+from transformers import AutoTokenizer, AutoModelForMaskedLM
+import torch.nn.functional as F
+import torch
+model = AutoModelForMaskedLM.from_pretrained("slupart/splade-disco-human-mistral")
+tokenizer = AutoTokenizer.from_pretrained("slupart/splade-disco-human-mistral")
+model.eval()
+conv = [
+    ("what's the weather like today?", "it's sunny."),
+    ("should I wear sunscreen?", "yes, UV index is high."),
+    ("do I need sunglasses?", "definitely."),
+    ("where can I buy sunglasses?", "try the optician nearby."),
+    ("how much do they cost?", None)
+]
+parts = [conv[-1][0]] + [x for q, a in reversed(conv[:-1]) for x in (a, q) if x]
+text = " [SEP] ".join(parts)
+inputs = tokenizer(text, return_tensors="pt")
+with torch.no_grad():
+    logits = model(**inputs).logits
+sparse = F.relu(logits).max(1).values.squeeze(0)
+scores = [(tokenizer.convert_ids_to_tokens([i.item()])[0], sparse[i].item())
+          for i in torch.nonzero(sparse).squeeze(1)]
+for token, score in sorted(scores, key=lambda x: -x[1]):
+    print(f"Token: {token:15} | Score: {score:.4f}")
+```
 ## Citation
 If you use our checkpoint, please cite our work: