|
Roberta-base trained with linearly increasing alpha for alpha-entmax (from 1.0 to 2.0). |
|
|
|
To run, do this: |
|
```python |
|
from sparse_roberta import get_custom_model |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained('roberta-base') |
|
|
|
# Load the model |
|
model = get_custom_model( |
|
'mtreviso/sparsemax-roberta', |
|
initial_alpha=2.0, |
|
use_triton_entmax=False, |
|
from_scratch=False, |
|
) |
|
``` |
|
|
|
To run glue tasks, you can use the `run_glue.py` script. For example: |
|
``` |
|
python run_glue.py \ |
|
--model_name_or_path mtreviso/sparsemax-roberta \ |
|
--config_name roberta-base \ |
|
--tokenizer_name roberta-base \ |
|
--task_name rte \ |
|
--output_dir output-rte \ |
|
--do_train \ |
|
--do_eval \ |
|
--max_seq_length 512 \ |
|
--per_device_train_batch_size 32 \ |
|
--learning_rate 3e-5 \ |
|
--num_train_epochs 3 \ |
|
--save_steps 1000 \ |
|
--logging_steps 100 \ |
|
--save_total_limit 1 \ |
|
--overwrite_output_dir |
|
``` |