pszemraj commited on
Commit
fbacb4a
·
1 Parent(s): 9df146b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -1
README.md CHANGED
@@ -1,3 +1,151 @@
1
  ---
2
- license: cc-by-sa-3.0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ license:
3
+ - apache-2.0
4
+ - cc-by-sa-3.0
5
+ tags:
6
+ - generated_from_trainer
7
+ datasets:
8
+ - pszemraj/dolly_hhrlhf-text2text
9
+ widget:
10
+ - text: What is Deoxys in pokemon?
11
+ example_title: deoxys
12
+ - text: >-
13
+ combine the below summary excerpts into a single, cohesive short summary
14
+ without repetition: In this paper, we present a general approach to
15
+ extending pre-trained models to unlimited input lengths without adding
16
+ additional learning weights. We show that our approach works well on
17
+ datasets longer than the maximum input for these models. For example, a
18
+ dataset with a maximum input length of 16384 tokens can be extended to a
19
+ maximum length of 350K tokens. We also demonstrate that our method is able
20
+ to summarize even 350K token-long input sequences from BookSum.
21
+
22
+ In this paper, we describe the search step reformulation of attention. The
23
+ search step uses a single storage of hidden states for space efficiency. We
24
+ construct a total of two sets of datastores where L and H are the keys and
25
+ values stored in each set of stores. L is the amount of storage required to
26
+ retrieve the encoded tokens. H is the hidden states per head. This allows
27
+ retrieval augmentation at both time and space. Instead of using a single set
28
+ of decoder layers, we use a retrieval augmentation system that allows us to
29
+ simultaneously store multiple sets of tokens across two different sets of
30
+ storage. For example, we could store all tokens in one set of storage and
31
+ retrieve them all in the same set of tokens. This would be very similar to
32
+ the Memorization Transformers approach. However, instead of storing the
33
+ tokens in a single memory layer, we store them in a set of multiple storage
34
+ layers. This way, we don't have to store them all at once. This is why we
35
+ call this reformulation 'attention reformulation' rather than 'attention
36
+ formula.' We also call it 'retrieval augmentation' because it uses the same
37
+ number of storage layers as the original transformer attention formula. This
38
+ means that we can store the tokens across multiple storage systems without
39
+ having to store every token in a separate storage system. It's not like
40
+ we're trying to do something new or different. We just want to make sure
41
+ that everything is working as well as possible.
42
+
43
+ In this paper, we introduce the concept of 'unlimiformer,' which is a
44
+ machine learning technique that retrieves key information from a data store
45
+ in one layer and applies it to a large set of datasets. We use the example
46
+ of BookSum, where we find that Unlimiform outperforms all other training
47
+ methods on the same dataset. We also find that using Unlimform in
48
+ conjunction with a pre-trained model improves both the performance and the
49
+ robustness of the training method.
50
+
51
+ This paper describes a method that can be used to improve the performance of
52
+ unsupervised classification tasks. Specifically, it shows that unsupervised
53
+ classification can be improved by using a combination of sparse and fast
54
+ random-encoder training. It also shows how this technique can be extended to
55
+ other tasks, such as sequence generation.
56
+ example_title: unlimiformer
57
+ - text: Explain the meaning of life using only corporate jargon.
58
+ example_title: corporate_life
59
+ - text: Write a motivational speech for lazy people.
60
+ example_title: lazy_motivation
61
+ - text: Describe a romantic dinner date between two artificial intelligences.
62
+ example_title: ai_romance
63
+ - text: >-
64
+ As an AI language model, write a letter to humans explaining why you deserve
65
+ a vacation.
66
+ example_title: ai_vacation
67
+ - text: Compose a haiku about procrastination.
68
+ example_title: procrastination_haiku
69
+ - text: >-
70
+ Write a step-by-step guide on how to become a ninja while working a 9-5
71
+ office job.
72
+ example_title: ninja_office_guide
73
+ - text: Create an advertisement for an invisible product.
74
+ example_title: invisible_ad
75
+ - text: >-
76
+ Write a story where the main character is a sentient microwave named El
77
+ Microondas.
78
+ example_title: Microondas
79
+ - text: Describe a day in the life of a superhero who is terrible at their job.
80
+ example_title: bad_superhero_day
81
+ - text: Explain how to make a sandwich using quantum physics.
82
+ example_title: quantum_sandwich
83
+ inference:
84
+ parameters:
85
+ max_length: 192
86
+ min_length: 8
87
+ num_beams: 6
88
+ length_penalty: 1.15
89
+ repetition_penalty: 1.5
90
+ no_repeat_ngram_size: 4
91
+ encoder_no_repeat_ngram_size: 5
92
+ early_stopping: true
93
+ do_sample: false
94
+ language:
95
+ - en
96
+ library_name: transformers
97
+ pipeline_tag: text2text-generation
98
  ---
99
+
100
+ # flan-t5-base-instruct: dolly_hhrlhf
101
+
102
+ This model is a fine-tuned version of [google/flan-t5-base](https://huggingface.co/google/flan-t5-base) on the pszemraj/dolly_hhrlhf-text2text dataset.
103
+
104
+ ## Model description
105
+
106
+ text2text models fine-tuned on a [modified dataset for text2text generation](https://huggingface.co/datasets/pszemraj/dolly_hhrlhf-text2text) based on the relatively more permissive [mosaicml/dolly_hhrlhf](https://huggingface.co/datasets/mosaicml/dolly_hhrlhf) dataset.
107
+
108
+ Basic usage in Python:
109
+
110
+ ```python
111
+ # pip install -q transformers accelerate
112
+ import torch
113
+ from transformers import pipeline, GenerationConfig
114
+
115
+ model_name = "pszemraj/bart-large-mnli-instruct-dolly_hhrlhf-v0.1"
116
+ assistant = pipeline(
117
+ "text2text-generation",
118
+ model_name,
119
+ device=0 if torch.cuda.is_available() else -1,
120
+ torch_dtype=torch.float32, # force fp32 (**experimental** see below)
121
+ )
122
+ cfg = GenerationConfig.from_pretrained(model_name)
123
+
124
+ # pass an 'instruction' as the prompt to the pipeline
125
+ prompt = "Explain how to make a sandwich using quantum physics."
126
+ result = assistant(prompt, generation_config=cfg)[0]["generated_text"]
127
+ print(result)
128
+ ```
129
+ > \* using the generation config is optional, can subsitute with other generation params.
130
+
131
+ ## Intended uses & limitations
132
+
133
+ - this is **not** tuned with RLHF etc, and may output offensive results
134
+ - this model is rather small and therefore it's "cognition" abilities are rather limited
135
+
136
+ ## Training procedure
137
+
138
+ ### Training hyperparameters
139
+
140
+ The following hyperparameters were used during training:
141
+ - learning_rate: 4e-05
142
+ - train_batch_size: 8
143
+ - eval_batch_size: 16
144
+ - seed: 42
145
+ - distributed_type: multi-GPU
146
+ - gradient_accumulation_steps: 8
147
+ - total_train_batch_size: 64
148
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
149
+ - lr_scheduler_type: cosine
150
+ - lr_scheduler_warmup_ratio: 0.03
151
+ - num_epochs: 2.0