AshenR
/

AshenBERTo

@@ -3,35 +3,53 @@ license: unknown
 language:
 - si
 metrics:
-- perplexity
 library_name: transformers
 ---
-### Overview
-This is a slightly smaller model trained on half of the [Fasttext](https://fasttext.cc/docs/en/crawl-vectors.html) dataset. Since Sinhala is classified as a low-resource language, there is a significant scarcity of pre-trained models available for it. This lack of resources creates a noticeable gap in the language's representation within the field of natural language processing (NLP). As a result, developing new models tailored for Sinhala presents a valuable opportunity. This model can act as foundational tools to enable further advancements in downstream tasks such as sentiment analysis, machine translation, named entity recognition, or question answering.
-## Model Specification
-The model chosen for training is [Roberta](https://arxiv.org/abs/1907.11692) with the following specifications:
- 1. vocab_size=52000
- 2. max_position_embeddings=514
- 3. num_attention_heads=12
- 4. num_hidden_layers=6
- 5. type_vocab_size=1
-Perplexity Value - 3.5
-## How to Use
-You can use this model directly with a pipeline for masked language modeling:
-```py
 from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
 model = AutoModelWithLMHead.from_pretrained("ashen/AshenBERTo")
 tokenizer = AutoTokenizer.from_pretrained("ashen/AshenBERTo")
 fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
 fill_mask("මම ගෙදර <mask>.")
 ```

 language:
 - si
 metrics:
+  - name :
+    value: 64.59
 library_name: transformers
+tags:
+- AshenBerto
+- Sinhala
+- Roberta
 ---
+### 🌟 Overview
+This is a slightly smaller model trained on half of the [FastText](https://fasttext.cc/docs/en/crawl-vectors.html) dataset. Since Sinhala is a low-resource language, there’s a noticeable lack of pre-trained models available for it. 😕 This gap makes it harder to represent the language properly in the world of NLP.
+But hey, that’s where this model comes in! 🚀 It opens up exciting opportunities to improve tasks like sentiment analysis, machine translation, named entity recognition, or even question answering—tailored just for Sinhala. 🇱🇰✨
+---
+### 🛠 Model Specs
+Here’s what powers this model (we went with [RoBERTa](https://arxiv.org/abs/1907.11692)):
+1️⃣ **vocab_size** = 52,000
+2️⃣ **max_position_embeddings** = 514
+3️⃣ **num_attention_heads** = 12
+4️⃣ **num_hidden_layers** = 6
+5️⃣ **type_vocab_size** = 1
+🎯 **Perplexity Value**: 3.5
+---
+### 🚀 How to Use
+You can jump right in and use this model for masked language modeling! 🧩
+```python
 from transformers import AutoTokenizer, AutoModelWithLMHead, pipeline
+# Load the model and tokenizer
 model = AutoModelWithLMHead.from_pretrained("ashen/AshenBERTo")
 tokenizer = AutoTokenizer.from_pretrained("ashen/AshenBERTo")
+# Create a fill-mask pipeline
 fill_mask = pipeline('fill-mask', model=model, tokenizer=tokenizer)
+# Try it out with a Sinhala sentence! 🇱🇰
 fill_mask("මම ගෙදර <mask>.")
 ```