---
library_name: transformers
license: mit
datasets:
- SciPhi/textbooks-are-all-you-need-lite
- nampdn-ai/tiny-textbooks
- nampdn-ai/tiny-strange-textbooks
- nampdn-ai/tiny-codes
- nampdn-ai/tiny-math-textbooks
- nampdn-ai/tiny-webtext
- nampdn-ai/tiny-orca-textbooks
- nampdn-ai/tiny-lessons
- roneneldan/TinyStories
- ajibawa-2023/Children-Stories-Collection
- ajibawa-2023/General-Stories-Collection
- kerinin/hackernews-stories
- lucadiliello/wikipedia_512_pretraining
- Salesforce/wikitext
- ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions
- iamtarun/python_code_instructions_18k_alpaca
- prithivMLmods/Step-Instruction-Gx
- LinhDuong/chatdoctor-200k
- MBZUAI/LaMini-instruction
- qwedsacf/grade-school-math-instructions
- TigerResearch/tigerbot-stackexchange-qa-en-0.5m
language:
- en
---

# amusktweewt/tiny-model-500M-chat-v2

This model is a general-purpose transformer-based language model designed for tasks such as text generation, story writing, and conversational interactions. It leverages multiple curated datasets to enhance its storytelling, coding, and question-answering capabilities. This project is intended for academic research and educational purposes only. It is designed for experimentation, learning, and development of language-based AI systems.

## Model Details

### Model Description

The model was developed with a focus on balancing performance and computational efficiency. It employs **flash attention** and other optimizations to improve memory efficiency and speed.

- **Developed by:** amusktweewt
- **Model type:** LlamaForCausalLM
- **Architectural Details:**
  - 12 layers
  - 16 attention heads
  - Hidden size: 1536
  - Flash attention 2 enabled
  - Dynamic RoPE scaling
- **License:** MIT
- **Language(s) (NLP):** English

## Uses

### Direct Use

This model is intended for text generation, code completion, chat-based applications, and story writing.

### Out-of-Scope Use

- Tasks requiring high factual accuracy
- Math or thinking related tasks
- Applications involving sensitive content without human review

## Training Details

### Training Data

The model was trained on a diverse collection of datasets, including:

- Textbooks and academic content
- Creative and children's stories
- Coding instruction datasets
- Wiki-based texts and general stories
- Mathematics and step-by-step solutions

### Training Procedure

#### Preprocessing

- Custom BPE tokenizer with a vocabulary size of 32,768
- Applied dynamic RoPE scaling for better long-context handling

#### Hyperparameters

- **Batch size:** 12 (per device)
- **Gradient accumulation:** 2 steps
- **Learning rate:** 1e-5
- **Weight decay:** 0.002
- **Warmup ratio:** 10%
- **Precision:** FP16 (mixed precision)

#### Training Setup

- **Hardware:** NVIDIA 4090 GPU
- **Training Time:** 216 hours
- **Dataset Size** 69 GB of Text

## Evaluation

### Testing Data, Factors & Metrics

The model was evaluated using subsets of the training data, focusing on language coherence, relevancy, and fluency.

#### Metrics

- **Loss:** Evaluated based on token-level prediction accuracy.
- **Perplexity:** 2.506

### Results

The model generates coherent and most of the time contextually appropriate outputs across multiple domains.

## Risks and Limitations

### Known Issues

- The model may produce outputs reflecting biases present in the training data.

### Recommendations

Users should apply human review when using the model in critical or sensitive applications.

## How to Get Started with the Model

```python
import torch
from transformers import pipeline, set_seed

model_name = "amusktweewt/tiny-model-500M-chat-v2"
chatbot = pipeline(
    "text-generation",
    model=model_name,
    device=0
)

set_seed(42)

print("Chatbot is ready! Type 'exit' to end the conversation.")

while True:
    user_input = input("You: ").strip()
    if user_input.lower() == "exit":
        print("Exiting chat. Goodbye!")
        break

    messages = [
        {"role": "user", "content": user_input},
        {"role": "assistant", "content": ""}
    ]

    prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False)

    # Generate text using the formatted prompt.
    response = chatbot(
        prompt,
        do_sample=True,
        max_new_tokens=512,
        top_k=50,
        temperature=0.1,
        num_return_sequences=1,
        repetition_penalty=1.1,
        pad_token_id=chatbot.tokenizer.eos_token_id,
        min_new_tokens=0
    )

    full_text = response[0]["generated_text"]
    bot_response = full_text[len(prompt):].strip()
    print(f"Bot: {bot_response}")
```

## Technical Specifications

### Model Architecture and Objective

The model follows a **Transformer-based architecture** optimized for causal language modeling tasks.

- Attention heads: 16
- Hidden size: 1536
- Flash attention and memory-efficient attention enabled

### Compute Infrastructure

#### Hardware

- Single GPU (NVIDIA 4090)

#### Software

- Python 3.8+
- HuggingFace Transformers 4.48.0
- PyTorch 2.4

## Environmental Impact

- **Training Hours:** 216 hours
- **Hardware:** NVIDIA 4090
- **Carbon Emitted:** 9.07 kg CO2 eq

## Model Card Authors

amusktweewt

## Model Card Contact

For questions or feedback, contact amusktweewt.