EduMixtral-4x7B / README.md

Update README.md

1cbb9aa verified 9 months ago

3.71 kB

	---
	license: cc-by-nc-4.0
	tags:
	- moe
	- frankenmoe
	- merge
	- mergekit
	- lazymergekit
	base_model:
	- mlabonne/NeuralDaredevil-7B
	- BioMistral/BioMistral-7B
	- mistralai/Mathstral-7B-v0.1
	- FPHam/Writing_Partner_Mistral_7B
	library_name: transformers
	pipeline_tag: text-generation
	---

	# EduMixtral-4x7B

	<img src="https://cdn-uploads.huggingface.co/production/uploads/65ba68a15d2ef0a4b2c892b4/1hvgYltQRmbkzHMSXvGYh.jpeg" width=400>

	EduMixtral-4x7B is an experimental model that combines different educational focused language models intended for downstream human/ai student/teacher application research.
	Intended to cover: general knowledge, medical field, math, and writing assistance.

	## 🤏 Models Merged

	EduMixtral-4x7B is a Mixture of Experts (MoE) made with the following models using [Mergekit](https://github.com/arcee-ai/mergekit):
	* [mlabonne/NeuralDaredevil-7B](https://huggingface.co/mlabonne/NeuralDaredevil-7B) <- Base Model
	* [BioMistral/BioMistral-7B](https://huggingface.co/BioMistral/BioMistral-7B)
	* [mistralai/Mathstral-7B-v0.1](https://huggingface.co/mistralai/Mathstral-7B-v0.1)
	* [FPHam/Writing_Partner_Mistral_7B](https://huggingface.co/FPHam/Writing_Partner_Mistral_7B)

	## 🧩 Configuration

	```yaml
	base_model: mlabonne/NeuralDaredevil-7B
	gate_mode: hidden
	experts:
	- source_model: mlabonne/NeuralDaredevil-7B
	positive_prompts:
	- "hello"
	- "help"
	- "question"
	- "explain"
	- "information"
	- source_model: BioMistral/BioMistral-7B
	positive_prompts:
	- "medical"
	- "health"
	- "biomedical"
	- "clinical"
	- "anatomy"
	- source_model: mistralai/Mathstral-7B-v0.1
	positive_prompts:
	- "math"
	- "calculation"
	- "equation"
	- "geometry"
	- "algebra"
	- source_model: FPHam/Writing_Partner_Mistral_7B
	positive_prompts:
	- "writing"
	- "creative process"
	- "story structure"
	- "character development"
	- "plot"
	```

	## 💻 Usage

	It is reccomended to load in 8bit or 4bit quantization

	```python
	from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig

	# Load the tokenizer and model
	tokenizer = AutoTokenizer.from_pretrained("AdamLucek/EduMixtral-4x7B")
	model = AutoModelForCausalLM.from_pretrained(
	"AdamLucek/EduMixtral-4x7B",
	device_map="cuda",
	quantization_config=BitsAndBytesConfig(load_in_8bit=True)
	)

	# Prepare the input text
	input_text = "Math problem: Xiaoli reads a 240-page story book. She reads (1/8) of the whole book on the first day and (1/5) of the whole book on the second day. How many pages did she read in total in two days?"
	input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")

	# Generate the output with specified parameters
	outputs = model.generate(
	**input_ids,
	max_new_tokens=256,
	num_return_sequences=1
	)

	# Decode and print the generated text
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	Output:

	>Solution:
	>To find the total number of pages Xiaoli read in two days, we need to add the number of pages she read on the first day and the second day.
	>On the first day, Xiaoli read 1/8 of the book. Since the book has 240 pages, the number of pages she read on the first day is:
	>\[ \frac{1}{8} \times 240 = 30 \text{ pages} \]
	>On the second day, Xiaoli read 1/5 of the book. The number of pages she read on the second day is:
	>\[ \frac{1}{5} \times 240 = 48 \text{ pages} \]
	>To find the total number of pages she read in two days, we add the pages she read on the first day and the second day:
	>\[ 30 \text{ pages} + 48 \text{ pages} = 78 \text{ pages} \]
	>Therefore, Xiaoli read a total of 78 pages in two days.
	>Final answer: Xiaoli read 78 pages in total