--- library_name: transformers license: mit datasets: - SciPhi/textbooks-are-all-you-need-lite - nampdn-ai/tiny-textbooks - nampdn-ai/tiny-strange-textbooks - nampdn-ai/tiny-codes - nampdn-ai/tiny-math-textbooks - nampdn-ai/tiny-webtext - nampdn-ai/tiny-orca-textbooks - nampdn-ai/tiny-lessons - roneneldan/TinyStories - ajibawa-2023/Children-Stories-Collection - ajibawa-2023/General-Stories-Collection - kerinin/hackernews-stories - lucadiliello/wikipedia_512_pretraining - Salesforce/wikitext - ChristophSchuhmann/basic-math-problems-with-step-by-step-solutions - iamtarun/python_code_instructions_18k_alpaca - prithivMLmods/Step-Instruction-Gx - LinhDuong/chatdoctor-200k - MBZUAI/LaMini-instruction - qwedsacf/grade-school-math-instructions - TigerResearch/tigerbot-stackexchange-qa-en-0.5m language: - en --- # amusktweewt/tiny-model-500M-chat-v2 This model is a general-purpose transformer-based language model designed for tasks such as text generation, story writing, and conversational interactions. It leverages multiple curated datasets to enhance its storytelling, coding, and question-answering capabilities. This project is intended for academic research and educational purposes only. It is designed for experimentation, learning, and development of language-based AI systems. ## Model Details ### Model Description The model was developed with a focus on balancing performance and computational efficiency. It employs **flash attention** and other optimizations to improve memory efficiency and speed. - **Developed by:** amusktweewt - **Model type:** LlamaForCausalLM - **Architectural Details:** - 12 layers - 16 attention heads - Hidden size: 1536 - Flash attention 2 enabled - Dynamic RoPE scaling - **License:** MIT - **Language(s) (NLP):** English ## Uses ### Direct Use This model is intended for text generation, code completion, chat-based applications, and story writing. ### Out-of-Scope Use - Tasks requiring high factual accuracy - Math or thinking related tasks - Applications involving sensitive content without human review ## Training Details ### Training Data The model was trained on a diverse collection of datasets, including: - Textbooks and academic content - Creative and children's stories - Coding instruction datasets - Wiki-based texts and general stories - Mathematics and step-by-step solutions ### Training Procedure #### Preprocessing - Custom BPE tokenizer with a vocabulary size of 32,768 - Applied dynamic RoPE scaling for better long-context handling #### Hyperparameters - **Batch size:** 12 (per device) - **Gradient accumulation:** 2 steps - **Learning rate:** 1e-5 - **Weight decay:** 0.002 - **Warmup ratio:** 10% - **Precision:** FP16 (mixed precision) #### Training Setup - **Hardware:** NVIDIA 4090 GPU - **Training Time:** 216 hours - **Dataset Size** 69 GB of Text ## Evaluation ### Testing Data, Factors & Metrics The model was evaluated using subsets of the training data, focusing on language coherence, relevancy, and fluency. #### Metrics - **Loss:** Evaluated based on token-level prediction accuracy. - **Perplexity:** 2.506 ### Results The model generates coherent and most of the time contextually appropriate outputs across multiple domains. ## Risks and Limitations ### Known Issues - The model may produce outputs reflecting biases present in the training data. ### Recommendations Users should apply human review when using the model in critical or sensitive applications. ## How to Get Started with the Model ```python import torch from transformers import pipeline, set_seed model_name = "amusktweewt/tiny-model-500M-chat-v2" chatbot = pipeline( "text-generation", model=model_name, device=0 ) set_seed(42) print("Chatbot is ready! Type 'exit' to end the conversation.") while True: user_input = input("You: ").strip() if user_input.lower() == "exit": print("Exiting chat. Goodbye!") break messages = [ {"role": "user", "content": user_input}, {"role": "assistant", "content": ""} ] prompt = chatbot.tokenizer.apply_chat_template(messages, tokenize=False) # Generate text using the formatted prompt. response = chatbot( prompt, do_sample=True, max_new_tokens=512, top_k=50, temperature=0.1, num_return_sequences=1, repetition_penalty=1.1, pad_token_id=chatbot.tokenizer.eos_token_id, min_new_tokens=0 ) full_text = response[0]["generated_text"] bot_response = full_text[len(prompt):].strip() print(f"Bot: {bot_response}") ``` ## Technical Specifications ### Model Architecture and Objective The model follows a **Transformer-based architecture** optimized for causal language modeling tasks. - Attention heads: 16 - Hidden size: 1536 - Flash attention and memory-efficient attention enabled ### Compute Infrastructure #### Hardware - Single GPU (NVIDIA 4090) #### Software - Python 3.8+ - HuggingFace Transformers 4.48.0 - PyTorch 2.4 ## Environmental Impact - **Training Hours:** 216 hours - **Hardware:** NVIDIA 4090 - **Carbon Emitted:** 9.07 kg CO2 eq ## Model Card Authors amusktweewt ## Model Card Contact For questions or feedback, contact amusktweewt.