agentlans commited on
Commit
3512914
·
verified ·
1 Parent(s): 1276a04

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -18
README.md CHANGED
@@ -1,30 +1,30 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
  # flan-t5-small-capitalizer
5
 
6
- This model is a fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small)
7
- on the [agentlans/c4-en-lowercased](https://huggingface.co/datasets/agentlans/c4-en-lowercased) dataset.
8
 
9
- It restores proper noun and sentence capitalization to lowercased text.
10
- It builds upon the capabilities of the FLAN-T5 small model, which is known for its strong performance across various natural language processing tasks.
 
 
11
 
12
- ## Intended uses & limitations
13
-
14
- This model is intended for
15
  - Capitalizing lowercased text
16
- - Restoring proper noun and sentence capitalization
17
  - Text normalization
18
 
19
- Limitations:
20
- - English language only
21
- - Focused on modern prose found on the Internet
22
- - May not capitalize titles correctly
23
- - Not guaranteed to use a capitalization style consistently
24
- - May have trouble with special terms and abbreviations that need to be capitalized
25
- - Maximum input and output 1024 tokens
26
-
27
- Usage example:
28
 
29
  ```python
30
  import torch
@@ -41,6 +41,16 @@ print(output[0]["generated_text"])
41
  # Expected output: Buzzfeed's 360-degree look at the aftermath of California's Valley Fire has been viewed more than 6 million times. Plenty of viewers have been asking how we made it.
42
  ```
43
 
 
 
 
 
 
 
 
 
 
 
44
  ## Training and evaluation data
45
 
46
  The model was trained on a subset of the C4 dataset's English configuration.
@@ -93,4 +103,4 @@ Additional training arguments included bf16 precision, automatic batch size find
93
  - Transformers 4.43.3
94
  - PyTorch 2.3.0+cu121
95
  - Datasets 3.2.0
96
- - Tokenizers 0.19.1
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - agentlans/c4-en-lowercased
5
+ language:
6
+ - en
7
+ base_model:
8
+ - google/flan-t5-small
9
+ tags:
10
+ - capitalization
11
  ---
12
  # flan-t5-small-capitalizer
13
 
14
+ A specialized fine-tuned version of [google/flan-t5-small](https://huggingface.co/google/flan-t5-small)
15
+ trained on the [agentlans/c4-en-lowercased](https://huggingface.co/datasets/agentlans/c4-en-lowercased) dataset to restore proper capitalization.
16
 
17
+ ## Key Features
18
+ - Restores proper noun and sentence capitalization
19
+ - Builds on FLAN-T5 small's robust NLP capabilities
20
+ - Designed for text normalization tasks
21
 
22
+ ## Intended Uses
 
 
23
  - Capitalizing lowercased text
24
+ - Sentence and proper noun capitalization
25
  - Text normalization
26
 
27
+ ## Usage Example
 
 
 
 
 
 
 
 
28
 
29
  ```python
30
  import torch
 
41
  # Expected output: Buzzfeed's 360-degree look at the aftermath of California's Valley Fire has been viewed more than 6 million times. Plenty of viewers have been asking how we made it.
42
  ```
43
 
44
+ ## Limitations
45
+
46
+ - **Language**: English only
47
+ - **Text Type**: Primarily modern prose found on the Internet
48
+ - **Capitalization Issues**:
49
+ - May not capitalize titles correctly
50
+ - Inconsistent capitalization style across texts
51
+ - Difficulty with special terms and abbreviations requiring capitalization
52
+ - **Input/Output Constraint**: Maximum length of 1024 tokens for both input and output
53
+
54
  ## Training and evaluation data
55
 
56
  The model was trained on a subset of the C4 dataset's English configuration.
 
103
  - Transformers 4.43.3
104
  - PyTorch 2.3.0+cu121
105
  - Datasets 3.2.0
106
+ - Tokenizers 0.19.1