|
--- |
|
license: cc-by-nc-4.0 |
|
tags: |
|
- moe |
|
- frankenmoe |
|
- merge |
|
- mergekit |
|
- lazymergekit |
|
base_model: |
|
- mlabonne/NeuralDaredevil-7B |
|
- BioMistral/BioMistral-7B |
|
- mistralai/Mathstral-7B-v0.1 |
|
- FPHam/Writing_Partner_Mistral_7B |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# EduMixtral-4x7B |
|
|
|
<img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/1hvgYltQRmbkzHMSXvGYh.jpeg" width=400> |
|
|
|
EduMixtral-4x7B is an experimental model that combines different educational focused language models intended for downstream human/ai student/teacher application research. |
|
Intended to cover: general knowledge, medical field, math, and writing assistance. |
|
|
|
## 🤏 Models Merged |
|
|
|
EduMixtral-4x7B is a Mixture of Experts (MoE) made with the following models using [Mergekit](https://github.com/arcee-ai/mergekit): |
|
* [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) <- Base Model |
|
* [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B) |
|
* [mistralai/Mathstral-7B-v0.1](https://huggingface.co/mistralai/Mathstral-7B-v0.1) |
|
* [FPHam/Writing_Partner_Mistral_7B](https://huggingface.co/FPHam/Writing_Partner_Mistral_7B) |
|
|
|
## 🧩 Configuration |
|
|
|
```yaml |
|
base_model: mlabonne/NeuralDaredevil-7B |
|
gate_mode: hidden |
|
experts: |
|
- source_model: mlabonne/NeuralDaredevil-7B |
|
positive_prompts: |
|
- "hello" |
|
- "help" |
|
- "question" |
|
- "explain" |
|
- "information" |
|
- source_model: BioMistral/BioMistral-7B |
|
positive_prompts: |
|
- "medical" |
|
- "health" |
|
- "biomedical" |
|
- "clinical" |
|
- "anatomy" |
|
- source_model: mistralai/Mathstral-7B-v0.1 |
|
positive_prompts: |
|
- "math" |
|
- "calculation" |
|
- "equation" |
|
- "geometry" |
|
- "algebra" |
|
- source_model: FPHam/Writing_Partner_Mistral_7B |
|
positive_prompts: |
|
- "writing" |
|
- "creative process" |
|
- "story structure" |
|
- "character development" |
|
- "plot" |
|
``` |
|
|
|
## 💻 Usage |
|
|
|
It is reccomended to load in 8bit or 4bit quantization |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig |
|
|
|
# Load the tokenizer and model |
|
tokenizer = AutoTokenizer.from_pretrained("AdamLucek/EduMixtral-4x7B") |
|
model = AutoModelForCausalLM.from_pretrained( |
|
"AdamLucek/EduMixtral-4x7B", |
|
device_map="cuda", |
|
quantization_config=BitsAndBytesConfig(load_in_8bit=True) |
|
) |
|
|
|
# Prepare the input text |
|
input_text = "Math problem: Xiaoli reads a 240-page story book. She reads (1/8) of the whole book on the first day and (1/5) of the whole book on the second day. How many pages did she read in total in two days?" |
|
input_ids = tokenizer(input_text, return_tensors="pt").to("cuda") |
|
|
|
# Generate the output with specified parameters |
|
outputs = model.generate( |
|
**input_ids, |
|
max_new_tokens=256, |
|
num_return_sequences=1 |
|
) |
|
|
|
# Decode and print the generated text |
|
print(tokenizer.decode(outputs[0], skip_special_tokens=True)) |
|
``` |
|
|
|
**Output:** |
|
|
|
>Solution: |
|
>To find the total number of pages Xiaoli read in two days, we need to add the number of pages she read on the first day and the second day. |
|
>On the first day, Xiaoli read 1/8 of the book. Since the book has 240 pages, the number of pages she read on the first day is: |
|
>\[ \frac{1}{8} \times 240 = 30 \text{ pages} \] |
|
>On the second day, Xiaoli read 1/5 of the book. The number of pages she read on the second day is: |
|
>\[ \frac{1}{5} \times 240 = 48 \text{ pages} \] |
|
>To find the total number of pages she read in two days, we add the pages she read on the first day and the second day: |
|
>\[ 30 \text{ pages} + 48 \text{ pages} = 78 \text{ pages} \] |
|
>Therefore, Xiaoli read a total of 78 pages in two days. |
|
>Final answer: Xiaoli read 78 pages in total |