agentlans
/

flan-t5-small-capitalizer

Model card Files Files and versions Community

agentlans commited on Jan 19

Commit

3512914

·

verified ·

1 Parent(s): 1276a04

Update README.md

Files changed (1) hide show

README.md +28 -18

README.md CHANGED Viewed

@@ -1,30 +1,30 @@
 ---
 license: apache-2.0
 ---
 # flan-t5-small-capitalizer
-This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small)
-on the [agentlans/c4-en-lowercased](https://huggingface.co/datasets/agentlans/c4-en-lowercased) dataset.
-It restores proper noun and sentence capitalization to lowercased text.
-It builds upon the capabilities of the FLAN-T5 small model, which is known for its strong performance across various natural language processing tasks.
-## Intended uses & limitations
-This model is intended for
 - Capitalizing lowercased text
-- Restoring proper noun and sentence capitalization
 - Text normalization
-Limitations:
-- English language only
-- Focused on modern prose found on the Internet
-- May not capitalize titles correctly
-- Not guaranteed to use a capitalization style consistently
-- May have trouble with special terms and abbreviations that need to be capitalized
-- Maximum input and output 1024 tokens
-Usage example:
 ```python
 import torch
@@ -41,6 +41,16 @@ print(output[0]["generated_text"])
 # Expected output: Buzzfeed's 360-degree look at the aftermath of California's Valley Fire has been viewed more than 6 million times. Plenty of viewers have been asking how we made it.
 ```
 ## Training and evaluation data
 The model was trained on a subset of the C4 dataset's English configuration.
@@ -93,4 +103,4 @@ Additional training arguments included bf16 precision, automatic batch size find
 - Transformers 4.43.3
 - PyTorch 2.3.0+cu121
 - Datasets 3.2.0
-- Tokenizers 0.19.1

 ---
 license: apache-2.0
+datasets:
+- agentlans/c4-en-lowercased
+language:
+- en
+base_model:
+- google/flan-t5-small
+tags:
+- capitalization
 ---
 # flan-t5-small-capitalizer
+A specialized fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small)
+trained on the [agentlans/c4-en-lowercased](https://huggingface.co/datasets/agentlans/c4-en-lowercased) dataset to restore proper capitalization.
+## Key Features
+- Restores proper noun and sentence capitalization
+- Builds on FLAN-T5 small's robust NLP capabilities
+- Designed for text normalization tasks
+## Intended Uses
 - Capitalizing lowercased text
+- Sentence and proper noun capitalization
 - Text normalization
+## Usage Example
 ```python
 import torch
 # Expected output: Buzzfeed's 360-degree look at the aftermath of California's Valley Fire has been viewed more than 6 million times. Plenty of viewers have been asking how we made it.
 ```
+## Limitations
+- **Language**: English only
+- **Text Type**: Primarily modern prose found on the Internet
+- **Capitalization Issues**:
+  - May not capitalize titles correctly
+  - Inconsistent capitalization style across texts
+  - Difficulty with special terms and abbreviations requiring capitalization
+- **Input/Output Constraint**: Maximum length of 1024 tokens for both input and output
 ## Training and evaluation data
 The model was trained on a subset of the C4 dataset's English configuration.
 - Transformers 4.43.3
 - PyTorch 2.3.0+cu121
 - Datasets 3.2.0
+- Tokenizers 0.19.1