nur-dev commited on
Commit
86eb4dc
·
verified ·
1 Parent(s): 301ed00

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +72 -3
README.md CHANGED
@@ -1,3 +1,72 @@
1
- ---
2
- license: afl-3.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: afl-3.0
3
+ datasets:
4
+ - issai/kazqad
5
+ language:
6
+ - kk
7
+ library_name: transformers
8
+ pipeline_tag: question-answering
9
+ ---
10
+
11
+ # RoBERTa-Large-KazQAD for Question Answering
12
+
13
+ ## Model Description
14
+ nur-dev/roberta-large-kazqad is a fine-tuned version of RoBERTa-Kaz-Large, specifically optimized for the Question Answering (QA) task using the Kazakh Open-Domain Question Answering Dataset (KazQAD). This model is trained to extract precise answers from given contexts in the Kazakh language.
15
+
16
+ ## Usage
17
+ The model can be used with the Hugging Face Transformers library:
18
+ ```python
19
+ from transformers import AutoModelForQuestionAnswering, AutoTokenizer
20
+ import torch
21
+
22
+ # Load the fine-tuned model and tokenizer
23
+ repo_id = 'nur-dev/roberta-large-kazqad'
24
+ model = AutoModelForQuestionAnswering.from_pretrained(repo_id)
25
+ tokenizer = AutoTokenizer.from_pretrained(repo_id)
26
+
27
+ # Define the context and question
28
+ context = """
29
+ Алматы Қазақстанның ең ірі мегаполисі. Алматы – асқақ Тянь-Шань тауы жотасының көкжасыл бауырайынан,
30
+ Іле Алатауының бөктерінде, Қазақстан Республикасының оңтүстік-шығысында, Еуразия құрлығының орталығында орналасқан қала.
31
+ Бұл қаланы «қала-бақ» деп те атайды.
32
+ """
33
+ question = "Алматы қаласы Қазақстанның қай бөлігінде орналасқан?"
34
+
35
+ # Tokenize the input
36
+ inputs = tokenizer.encode_plus(
37
+ question,
38
+ context,
39
+ add_special_tokens=True,
40
+ return_tensors="pt"
41
+ )
42
+
43
+ input_ids = inputs["input_ids"]
44
+ attention_mask = inputs["attention_mask"]
45
+
46
+ # Perform inference
47
+ with torch.no_grad():
48
+ outputs = model(input_ids=input_ids, attention_mask=attention_mask)
49
+ start_logits = outputs.start_logits
50
+ end_logits = outputs.end_logits
51
+
52
+ # Find the answer's start and end position
53
+ start_index = torch.argmax(start_logits)
54
+ end_index = torch.argmax(end_logits)
55
+
56
+ # Decode the answer from the context
57
+ answer = tokenizer.decode(input_ids[0][start_index:end_index + 1])
58
+
59
+ print(f"Question: {question}")
60
+ print(f"Answer: {answer}")
61
+ ```
62
+
63
+ ## Limitations and Biases
64
+ • Language Specificity: This model is specifically fine-tuned for the Kazakh language and may not perform well in other languages.
65
+ • Context Length: The model has limitations with very long contexts, as it is fine-tuned for input lengths up to 512 tokens.
66
+ • Biases: Like other large pre-trained language models, nur-dev/roberta-large-kazqad may exhibit biases present in its training data. Users should be cautious and critically evaluate the model’s outputs, especially for sensitive applications.
67
+
68
+ ## Model Authors
69
+
70
+ **Name:** Kadyrbek Nurgali
71
+ - **Email:** [email protected]
72
+ - **LinkedIn:** [Kadyrbek Nurgali](https://www.linkedin.com/in/nurgali-kadyrbek-504260231/)