Upload 13 files

Browse files

Fork build Beta needs alot work

Files changed (13) hide show

.gitattributes +1 -1
README.md +46 -46
phantomstep_dcae/config.json +9 -0
phantomstep_dcae/diffusion_pytorch_model.safetensors +3 -0
phantomstep_transformer/config.json +10 -0
phantomstep_transformer/diffusion_pytorch_model.safetensors +3 -0
phantomstep_vocoder/config.json +8 -0
phantomstep_vocoder/diffusion_pytorch_model.safetensors +3 -0
umt5-base/config.json +9 -0
umt5-base/model.safetensors +3 -0
umt5-base/special_tokens_map.json +7 -0
umt5-base/tokenizer.json +20 -0
umt5-base/tokenizer_config.json +6 -0

.gitattributes CHANGED Viewed

@@ -27,4 +27,4 @@
 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
-*tfevents* filter=lfs diff=lfs merge=lfs -text

 *.xz filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,100 +1,95 @@
----
-license: mit
----
-license: apache-2.0 tags:
 ---
 license: apache-2.0
 tags:
-  - music 🎵
-  - text2music 🎤
-  - audio-generation 🔊
 pipeline_tag: text-to-audio
 library_name: diffusers
 language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
 ---
-# PhantomStep: The Ultimate Music Generation Foundation Model 🚀
 ![PhantomStep Framework](https://huggingface.co/ghostai1/GHOSTSONA/raw/main/fig/PhantomStep_framework.png)
-## 🎹 Model Description
-**PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust. 💨
 **Key Features:**
-- 🚄 **20× faster** than LLM-based baselines (15s for 4-minute tracks on A100)
-- 🎶 Flawless coherence in melody, harmony, and rhythm
-- 🎵 Full-song generation with precise duration control
-- 🌍 Multilingual text-to-music with enhanced vocal synthesis
-- 🔜 *Upcoming*: Fine-grained style control and genre-specific optimizations
-## 🎧 Uses
 ### Direct Use
 PhantomStep empowers creators to:
 - ✨ Craft original music from natural language prompts
-- 🔄 Remix tracks with seamless style transfers
 - ✍️ Edit lyrics and vocals with precision
 ### Downstream Use
 A foundation for innovation:
-- 🎙️ Advanced voice cloning
-- 🎸 Genre-specific music generators (e.g., trap, classical, K-pop)
-- 🎛️ Professional music production suites
-- 🤖 AI-driven creative assistants
 ### Out-of-Scope Use
 PhantomStep must **not** be used for:
-- 🚫 Unauthorized reproduction of copyrighted material
 - ⛔ Generating harmful or offensive content
-- 🕵️‍♂️ Misrepresenting AI-generated works as human creations
-## 🚀 How to Get Started
 Dive into the code and demos:
-- 📂 [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
-- 🎮 [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
 ## ⚡ Hardware Performance
 | Device        | 27 Steps | 60 Steps |
 |---------------|----------|----------|
 | NVIDIA A100   | **30.50x** ⚡ | **14.10x** ⚡ |
-| RTX 4090      | **38.20x** 🚀 | **17.85x** 🚀 |
-| RTX 3090      | **15.30x** 🔥 | **8.12x** 🔥  |
-| M2 Max        | **3.15x** 🌟  | **1.45x** 🌟  |
 *RTF (Real-Time Factor) shown - higher values indicate faster generation*
-## 🛠️ Optimizations in Progress
 PhantomStep is actively addressing the following limitations:
-- 🎯 **Output Consistency**: Reducing "gacha-style" variability with stabilized random seeds and adaptive sampling.
-- 🎸 **Genre Performance**: Enhanced training for niche genres (e.g., Chinese rap, avant-garde jazz).
-- 🎤 **Vocal Quality**: Refined vocal synthesis for natural, expressive outputs.
-- 📏 **Long-Form Coherence**: Improved structural integrity for tracks >5 minutes.
-- 🎛️ **Control Granularity**: Introducing precise controls for tempo, instrumentation, and dynamics.
-## 🌐 Ethical Considerations
 GhostAI commits to responsible AI:
 - ✅ Ensure originality of generated works
-- 📢 Disclose AI involvement in outputs
-- 🌍 Respect cultural nuances and intellectual property
-- 🚫 Prohibit harmful or unethical content generation
-## 🔍 Model Details
 **Developed by:** *GhostAI*
 **Model type:** Diffusion-based music generation with transformer conditioning
 **License:** Apache 2.0
 **Resources:**
-- 🌐 [Project Page](https://ghostai.github.io/GHOSTSONA) *(Coming Soon)*
-- 📂 [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
-- 🎮 [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
-## 📜 Citation
 ```bibtex
 @misc{ghostai2025phantomstep,
@@ -103,4 +98,9 @@ GhostAI commits to responsible AI:
   howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
   year={2025},
   note={Hugging Face repository}
-}

 ---
 license: apache-2.0
 tags:
+  - music �
+  - text2music �
+  - audio-generation �
 pipeline_tag: text-to-audio
 library_name: diffusers
 language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
 ---
+# PhantomStep: The Ultimate Music Generation Foundation Model �
 ![PhantomStep Framework](https://huggingface.co/ghostai1/GHOSTSONA/raw/main/fig/PhantomStep_framework.png)
+## � Model Description
+**PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust. �
 **Key Features:**
+- � **20× faster** than LLM-based baselines (15s for 4-minute tracks on A100)
+- � Flawless coherence in melody, harmony, and rhythm
+- � Full-song generation with precise duration control
+- � Multilingual text-to-music with enhanced vocal synthesis
+- � *Upcoming*: Fine-grained style control and genre-specific optimizations
+## � Uses
 ### Direct Use
 PhantomStep empowers creators to:
 - ✨ Craft original music from natural language prompts
+- � Remix tracks with seamless style transfers
 - ✍️ Edit lyrics and vocals with precision
 ### Downstream Use
 A foundation for innovation:
+- �️ Advanced voice cloning
+- � Genre-specific music generators (e.g., trap, classical, K-pop)
+- �️ Professional music production suites
+- � AI-driven creative assistants
 ### Out-of-Scope Use
 PhantomStep must **not** be used for:
+- � Unauthorized reproduction of copyrighted material
 - ⛔ Generating harmful or offensive content
+- �️‍♂️ Misrepresenting AI-generated works as human creations
+## � How to Get Started
 Dive into the code and demos:
+- � [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
+- � [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
 ## ⚡ Hardware Performance
 | Device        | 27 Steps | 60 Steps |
 |---------------|----------|----------|
 | NVIDIA A100   | **30.50x** ⚡ | **14.10x** ⚡ |
+| RTX 4090      | **38.20x** � | **17.85x** � |
+| RTX 3090      | **15.30x** � | **8.12x** �  |
+| M2 Max        | **3.15x** �  | **1.45x** �  |
 *RTF (Real-Time Factor) shown - higher values indicate faster generation*
+## �️ Optimizations in Progress
 PhantomStep is actively addressing the following limitations:
+- � **Output Consistency**: Reducing "gacha-style" variability with stabilized random seeds and adaptive sampling.
+- � **Genre Performance**: Enhanced training for niche genres (e.g., Chinese rap, avant-garde jazz).
+- � **Vocal Quality**: Refined vocal synthesis for natural, expressive outputs.
+- � **Long-Form Coherence**: Improved structural integrity for tracks >5 minutes.
+- �️ **Control Granularity**: Introducing precise controls for tempo, instrumentation, and dynamics.
+## � Ethical Considerations
 GhostAI commits to responsible AI:
 - ✅ Ensure originality of generated works
+- � Disclose AI involvement in outputs
+- � Respect cultural nuances and intellectual property
+- � Prohibit harmful or unethical content generation
+## � Model Details
 **Developed by:** *GhostAI*
 **Model type:** Diffusion-based music generation with transformer conditioning
 **License:** Apache 2.0
 **Resources:**
+- � [Project Page](https://ghostai.github.io/GHOSTSONA) *(Coming Soon)*
+- � [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
+- � [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
+## � Citation
 ```bibtex
 @misc{ghostai2025phantomstep,
   howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
   year={2025},
   note={Hugging Face repository}
+}
+```
+## � Acknowledgements
+Built on the shoulders of ACE Studio and StepFun. *GhostAI* takes it to the **next level**. �

phantomstep_dcae/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "model_type": "deep_compression_autoencoder",
+  "name": "PhantomStep DCAE",
+  "description": "Deep Compression AutoEncoder for PhantomStep by GhostAI",
+  "compression_ratio": "f8c8",
+  "input_channels": 1,
+  "output_channels": 1,
+  "latent_dim": 128
+}

phantomstep_dcae/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:b3738174f266c3c74ecfecae114358ea2bd581e54b414d8a6203767fdf4c9f65
+size 134

phantomstep_transformer/config.json ADDED Viewed

	@@ -0,0 +1,10 @@

+{
+  "model_type": "diffusion_transformer",
+  "name": "PhantomStep Transformer",
+  "description": "Transformer backbone for PhantomStep by GhostAI",
+  "architecture": "lightweight_linear_transformer",
+  "input_dim": 512,
+  "output_dim": 512,
+  "num_layers": 12,
+  "num_heads": 8
+}

phantomstep_transformer/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:1491c6aab4ea255409d092c05171608352fbea6a835a210c5922a8cca498dff5
+size 135

phantomstep_vocoder/config.json ADDED Viewed

	@@ -0,0 +1,8 @@

+{
+  "model_type": "vocoder",
+  "name": "PhantomStep Vocoder",
+  "description": "Vocoder for audio synthesis in PhantomStep by GhostAI",
+  "sample_rate": 22050,
+  "input_dim": 128,
+  "output_dim": 1
+}

phantomstep_vocoder/diffusion_pytorch_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8e66bc7174df4a69a6044f3c4b1a1a19586716e132b92aab7b4e061576db74c8
+size 134

umt5-base/config.json ADDED Viewed

	@@ -0,0 +1,9 @@

+{
+  "model_type": "umt5",
+  "name": "UMT5 for PhantomStep",
+  "description": "Text embedding model for PhantomStep by GhostAI",
+  "vocab_size": 250112,
+  "d_model": 512,
+  "num_layers": 12,
+  "num_heads": 8
+}

umt5-base/model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e944764d8d1c700b188ad0656f409deb8fec04588d2bc74445c2a0618a925c80
+size 135

umt5-base/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bos_token": "<s>",
+  "eos_token": "</s>",
+  "unk_token": "<unk>",
+  "pad_token": "<pad>",
+  "additional_special_tokens": ["[PhantomStep]"]
+}

umt5-base/tokenizer.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "version": "1.0",
+  "model": "UMT5 for PhantomStep",
+  "description": "Tokenizer vocabulary for PhantomStep by GhostAI",
+  "vocab": {
+    "<s>": 0,
+    "</s>": 1,
+    "<unk>": 2,
+    "<pad>": 3,
+    "[PhantomStep]": 250112,
+    "...": "..."
+  },
+  "added_tokens": [
+    {
+      "id": 250112,
+      "content": "[PhantomStep]",
+      "special": true
+    }
+  ]
+}

umt5-base/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+  "tokenizer_class": "T5Tokenizer",
+  "model_max_length": 512,
+  "name": "UMT5 Tokenizer for PhantomStep",
+  "description": "Tokenizer for text processing in PhantomStep by GhostAI"
+}