File size: 9,442 Bytes
d879f83 40a523e d879f83 40a523e d879f83 2be8aa3 d879f83 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 |
# GLiREL : Generalist and Lightweight model for Zero-Shot Relation Extraction
GLiREL is a Relation Extraction model capable of classifying unseen relations given the entities within a text. This builds upon the excelent work done by Urchade Zaratiana, Nadi Tomeh, Pierre Holat, Thierry Charnois on the [GLiNER](https://github.com/urchade/GLiNER) library which enables efficient zero-shot Named Entity Recognition.
<p align="center">
<a href="https://pypi.org/project/glirel/" target="_blank">
<img alt="Python" src="https://img.shields.io/badge/python-3670A0?style=for-the-badge&logo=python&logoColor=ffdd54" />
<img alt="Version" src="https://img.shields.io/pypi/v/glirel?style=for-the-badge&color=3670A0">
</a>
</p>
<p align="center">
<a href="https://arxiv.org/abs/2501.03172">π GLiREL Paper</a>
<span> β’ </span>
<a href="https://arxiv.org/abs/2311.08526">π GLiNER Paper</a>
<span> β’ </span>
<a href="https://huggingface.co/spaces/jackboyla/GLiREL">π€ Demo</a>
<span> β’ </span>
<a href="https://huggingface.co/collections/jackboyla/glirel-6766b213a4c1fa8c4e982322">π€ Available models</a>
</p>
---
# Installation
```bash
pip install glirel
```
## Usage
Once you've downloaded the GLiREL library, you can import the `GLiREL` class. You can then load this model using `GLiREL.from_pretrained` and predict entities with `predict_relations`.
```python
from glirel import GLiREL
import spacy
model = GLiREL.from_pretrained("jackboyla/glirel_beta")
nlp = spacy.load('en_core_web_sm')
text = 'Derren Nesbitt had a history of being cast in "Doctor Who", having played villainous warlord Tegana in the 1964 First Doctor serial "Marco Polo".'
doc = nlp(text)
tokens = [token.text for token in doc]
labels = ['country of origin', 'licensed to broadcast to', 'father', 'followed by', 'characters']
ner = [[26, 27, 'PERSON', 'Marco Polo'], [22, 23, 'Q2989412', 'First Doctor']] # 'type' is not used -- it can be any string!
relations = model.predict_relations(tokens, labels, threshold=0.0, ner=ner, top_k=1)
print('Number of relations:', len(relations))
sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")
```
### Expected Output
```
Number of relations: 2
Descending Order by Score:
{'head_pos': [26, 28], 'tail_pos': [22, 24], 'head_text': ['Marco', 'Polo'], 'tail_text': ['First', 'Doctor'], 'label': 'characters', 'score': 0.9923334121704102}
{'head_pos': [22, 24], 'tail_pos': [26, 28], 'head_text': ['First', 'Doctor'], 'tail_text': ['Marco', 'Polo'], 'label': 'characters', 'score': 0.9915636777877808}
```
## Constrain labels
In practice, we usually want to define the types of entities that can exist as a head and/or tail of a relationship. This is already implemented in GLiREL:
```python
labels = {"glirel_labels": {
'co-founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
'no relation': {}, # head and tail can be any entity type
'country of origin': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["LOC", "GPE"]},
'parent': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'located in or next to body of water': {"allowed_head": ["LOC", "GPE", "FAC"], "allowed_tail": ["LOC", "GPE"]},
'spouse': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'child': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
'founded on date': {"allowed_head": ["ORG"], "allowed_tail": ["DATE"]},
'headquartered in': {"allowed_head": ["ORG"], "allowed_tail": ["LOC", "GPE", "FAC"]},
'acquired by': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
'subsidiary of': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
}
}
```
## Usage with spaCy
You can also load GliREL into a regular spaCy NLP pipeline. Here's an example using an English pipeline.
```python
import spacy
import glirel
# Load a blank spaCy model or an existing one
nlp = spacy.load('en_core_web_sm')
# Add the GLiREL component to the pipeline
nlp.add_pipe("glirel", after="ner")
# Now you can use the pipeline with the GLiREL component
text = "Apple Inc. was founded by Steve Jobs, Steve Wozniak, and Ronald Wayne in April 1976. The company is headquartered in Cupertino, California."
labels = {"glirel_labels": {
'co-founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
'country of origin': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["LOC", "GPE"]},
'licensed to broadcast to': {"allowed_head": ["ORG"]},
'no relation': {},
'parent': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'followed by': {"allowed_head": ["PERSON", "ORG"], "allowed_tail": ["PERSON", "ORG"]},
'located in or next to body of water': {"allowed_head": ["LOC", "GPE", "FAC"], "allowed_tail": ["LOC", "GPE"]},
'spouse': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'child': {"allowed_head": ["PERSON"], "allowed_tail": ["PERSON"]},
'founder': {"allowed_head": ["PERSON"], "allowed_tail": ["ORG"]},
'headquartered in': {"allowed_head": ["ORG"], "allowed_tail": ["LOC", "GPE", "FAC"]},
'acquired by': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
'subsidiary of': {"allowed_head": ["ORG"], "allowed_tail": ["ORG", "PERSON"]},
}
}
# Add the labels to the pipeline at inference time
docs = list( nlp.pipe([(text, labels)], as_tuples=True) )
relations = docs[0][0]._.relations
print('Number of relations:', len(relations))
sorted_data_desc = sorted(relations, key=lambda x: x['score'], reverse=True)
print("\nDescending Order by Score:")
for item in sorted_data_desc:
print(f"{item['head_text']} --> {item['label']} --> {item['tail_text']} | score: {item['score']}")
```
### Expected Output
```
Number of relations: 5
Descending Order by Score:
['Apple', 'Inc.'] --> headquartered in --> ['California'] | score: 0.9854260683059692
['Apple', 'Inc.'] --> headquartered in --> ['Cupertino'] | score: 0.9569844603538513
['Steve', 'Wozniak'] --> co-founder --> ['Apple', 'Inc.'] | score: 0.09025496244430542
['Steve', 'Jobs'] --> co-founder --> ['Apple', 'Inc.'] | score: 0.08805803954601288
['Ronald', 'Wayne'] --> co-founder --> ['Apple', 'Inc.'] | score: 0.07996643334627151
```
## Example training data
NOTE that the entity indices are inclusive i.e `"Binsey"` is `[7, 7]`. This differs from spaCy where the end index is exclusive (in this case spaCy would set the indices to `[7, 8]`)
JSONL file:
```json
{
"ner": [
[7, 7, "Q4914513", "Binsey"],
[11, 12, "Q19686", "River Thames"]
],
"relations": [
{
"head": {"mention": "Binsey", "position": [7, 7], "type": "LOC"}, # 'type' is not used -- it can be any string!
"tail": {"mention": "River Thames", "position": [11, 12], "type": "Q19686"},
"relation_text": "located in or next to body of water"
}
],
"tokenized_text": ["The", "race", "took", "place", "between", "Godstow", "and", "Binsey", "along", "the", "Upper", "River", "Thames", "."]
},
{
"ner": [
[9, 10, "Q4386693", "Legislative Assembly"],
[1, 3, "Q1848835", "Parliament of Victoria"]
],
"relations": [
{
"head": {"mention": "Legislative Assembly", "position": [9, 10], "type": "Q4386693"},
"tail": {"mention": "Parliament of Victoria", "position": [1, 3], "type": "Q1848835"},
"relation_text": "part of"
}
],
"tokenized_text": ["The", "Parliament", "of", "Victoria", "consists", "of", "the", "lower", "house", "Legislative", "Assembly", ",", "the", "upper", "house", "Legislative", "Council", "and", "the", "Queen", "of", "Australia", "."]
}
```
## License
[GLiREL](https://github.com/jackboyla/GLiREL) by [Jack Boylan](https://github.com/jackboyla) is licensed under [CC BY-NC-SA 4.0](https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1).
<a href="https://creativecommons.org/licenses/by-nc-sa/4.0/?ref=chooser-v1" target="_blank" rel="license noopener noreferrer">
<img src="https://mirrors.creativecommons.org/presskit/icons/cc.svg?ref=chooser-v1" alt="CC Logo" style="height: 20px; margin-right: 5px; vertical-align: text-bottom;">
<img src="https://mirrors.creativecommons.org/presskit/icons/by.svg?ref=chooser-v1" alt="BY Logo" style="height: 20px; margin-right: 5px; vertical-align: text-bottom;">
<img src="https://mirrors.creativecommons.org/presskit/icons/nc.svg?ref=chooser-v1" alt="NC Logo" style="height: 20px; margin-right: 5px; vertical-align: text-bottom;">
<img src="https://mirrors.creativecommons.org/presskit/icons/sa.svg?ref=chooser-v1" alt="SA Logo" style="height: 20px; margin-right: 5px; vertical-align: text-bottom;">
</a>
## Citation
If you use code or ideas from this project, please cite:
```
@misc{boylan2025glirelgeneralistmodel,
title={GLiREL -- Generalist Model for Zero-Shot Relation Extraction},
author={Jack Boylan and Chris Hokamp and Demian Gholipour Ghalandari},
year={2025},
eprint={2501.03172},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2501.03172},
}
```
|