ghostai1 commited on
Commit
6fae2ec
·
verified ·
1 Parent(s): 339555f

Upload 13 files

Browse files

Fork build Beta needs alot work

.gitattributes CHANGED
@@ -27,4 +27,4 @@
27
  *.xz filter=lfs diff=lfs merge=lfs -text
28
  *.zip filter=lfs diff=lfs merge=lfs -text
29
  *.zst filter=lfs diff=lfs merge=lfs -text
30
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
27
  *.xz filter=lfs diff=lfs merge=lfs -text
28
  *.zip filter=lfs diff=lfs merge=lfs -text
29
  *.zst filter=lfs diff=lfs merge=lfs -text
30
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,100 +1,95 @@
1
- ---
2
- license: mit
3
- ---
4
- license: apache-2.0 tags:
5
-
6
  ---
7
  license: apache-2.0
8
  tags:
9
- - music 🎵
10
- - text2music 🎤
11
- - audio-generation 🔊
12
  pipeline_tag: text-to-audio
13
  library_name: diffusers
14
  language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
15
  ---
16
 
17
- # PhantomStep: The Ultimate Music Generation Foundation Model 🚀
18
 
19
  ![PhantomStep Framework](https://huggingface.co/ghostai1/GHOSTSONA/raw/main/fig/PhantomStep_framework.png)
20
 
21
- ## 🎹 Model Description
22
 
23
- **PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust. 💨
24
 
25
  **Key Features:**
26
- - 🚄 **20× faster** than LLM-based baselines (15s for 4-minute tracks on A100)
27
- - 🎶 Flawless coherence in melody, harmony, and rhythm
28
- - 🎵 Full-song generation with precise duration control
29
- - 🌍 Multilingual text-to-music with enhanced vocal synthesis
30
- - 🔜 *Upcoming*: Fine-grained style control and genre-specific optimizations
31
 
32
- ## 🎧 Uses
33
 
34
  ### Direct Use
35
  PhantomStep empowers creators to:
36
  - ✨ Craft original music from natural language prompts
37
- - 🔄 Remix tracks with seamless style transfers
38
  - ✍️ Edit lyrics and vocals with precision
39
 
40
  ### Downstream Use
41
  A foundation for innovation:
42
- - 🎙️ Advanced voice cloning
43
- - 🎸 Genre-specific music generators (e.g., trap, classical, K-pop)
44
- - 🎛️ Professional music production suites
45
- - 🤖 AI-driven creative assistants
46
 
47
  ### Out-of-Scope Use
48
  PhantomStep must **not** be used for:
49
- - 🚫 Unauthorized reproduction of copyrighted material
50
  - ⛔ Generating harmful or offensive content
51
- - 🕵️‍♂️ Misrepresenting AI-generated works as human creations
52
 
53
- ## 🚀 How to Get Started
54
 
55
  Dive into the code and demos:
56
- - 📂 [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
57
- - 🎮 [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
58
 
59
  ## ⚡ Hardware Performance
60
 
61
  | Device | 27 Steps | 60 Steps |
62
  |---------------|----------|----------|
63
  | NVIDIA A100 | **30.50x** ⚡ | **14.10x** ⚡ |
64
- | RTX 4090 | **38.20x** 🚀 | **17.85x** 🚀 |
65
- | RTX 3090 | **15.30x** 🔥 | **8.12x** 🔥 |
66
- | M2 Max | **3.15x** 🌟 | **1.45x** 🌟 |
67
 
68
  *RTF (Real-Time Factor) shown - higher values indicate faster generation*
69
 
70
- ## 🛠️ Optimizations in Progress
71
 
72
  PhantomStep is actively addressing the following limitations:
73
- - 🎯 **Output Consistency**: Reducing "gacha-style" variability with stabilized random seeds and adaptive sampling.
74
- - 🎸 **Genre Performance**: Enhanced training for niche genres (e.g., Chinese rap, avant-garde jazz).
75
- - 🎤 **Vocal Quality**: Refined vocal synthesis for natural, expressive outputs.
76
- - 📏 **Long-Form Coherence**: Improved structural integrity for tracks >5 minutes.
77
- - 🎛️ **Control Granularity**: Introducing precise controls for tempo, instrumentation, and dynamics.
78
 
79
- ## 🌐 Ethical Considerations
80
 
81
  GhostAI commits to responsible AI:
82
  - ✅ Ensure originality of generated works
83
- - 📢 Disclose AI involvement in outputs
84
- - 🌍 Respect cultural nuances and intellectual property
85
- - 🚫 Prohibit harmful or unethical content generation
86
 
87
- ## 🔍 Model Details
88
 
89
  **Developed by:** *GhostAI*
90
  **Model type:** Diffusion-based music generation with transformer conditioning
91
  **License:** Apache 2.0
92
  **Resources:**
93
- - 🌐 [Project Page](https://ghostai.github.io/GHOSTSONA) *(Coming Soon)*
94
- - 📂 [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
95
- - 🎮 [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
96
 
97
- ## 📜 Citation
98
 
99
  ```bibtex
100
  @misc{ghostai2025phantomstep,
@@ -103,4 +98,9 @@ GhostAI commits to responsible AI:
103
  howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
104
  year={2025},
105
  note={Hugging Face repository}
106
- }
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: apache-2.0
3
  tags:
4
+ - music
5
+ - text2music
6
+ - audio-generation
7
  pipeline_tag: text-to-audio
8
  library_name: diffusers
9
  language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
10
  ---
11
 
12
+ # PhantomStep: The Ultimate Music Generation Foundation Model
13
 
14
  ![PhantomStep Framework](https://huggingface.co/ghostai1/GHOSTSONA/raw/main/fig/PhantomStep_framework.png)
15
 
16
+ ## Model Description
17
 
18
+ **PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust.
19
 
20
  **Key Features:**
21
+ - **20× faster** than LLM-based baselines (15s for 4-minute tracks on A100)
22
+ - Flawless coherence in melody, harmony, and rhythm
23
+ - Full-song generation with precise duration control
24
+ - Multilingual text-to-music with enhanced vocal synthesis
25
+ - *Upcoming*: Fine-grained style control and genre-specific optimizations
26
 
27
+ ## Uses
28
 
29
  ### Direct Use
30
  PhantomStep empowers creators to:
31
  - ✨ Craft original music from natural language prompts
32
+ - Remix tracks with seamless style transfers
33
  - ✍️ Edit lyrics and vocals with precision
34
 
35
  ### Downstream Use
36
  A foundation for innovation:
37
+ - �️ Advanced voice cloning
38
+ - Genre-specific music generators (e.g., trap, classical, K-pop)
39
+ - �️ Professional music production suites
40
+ - AI-driven creative assistants
41
 
42
  ### Out-of-Scope Use
43
  PhantomStep must **not** be used for:
44
+ - Unauthorized reproduction of copyrighted material
45
  - ⛔ Generating harmful or offensive content
46
+ - �️‍♂️ Misrepresenting AI-generated works as human creations
47
 
48
+ ## How to Get Started
49
 
50
  Dive into the code and demos:
51
+ - [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
52
+ - [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
53
 
54
  ## ⚡ Hardware Performance
55
 
56
  | Device | 27 Steps | 60 Steps |
57
  |---------------|----------|----------|
58
  | NVIDIA A100 | **30.50x** ⚡ | **14.10x** ⚡ |
59
+ | RTX 4090 | **38.20x** | **17.85x** |
60
+ | RTX 3090 | **15.30x** | **8.12x** |
61
+ | M2 Max | **3.15x** | **1.45x** |
62
 
63
  *RTF (Real-Time Factor) shown - higher values indicate faster generation*
64
 
65
+ ## �️ Optimizations in Progress
66
 
67
  PhantomStep is actively addressing the following limitations:
68
+ - **Output Consistency**: Reducing "gacha-style" variability with stabilized random seeds and adaptive sampling.
69
+ - **Genre Performance**: Enhanced training for niche genres (e.g., Chinese rap, avant-garde jazz).
70
+ - **Vocal Quality**: Refined vocal synthesis for natural, expressive outputs.
71
+ - **Long-Form Coherence**: Improved structural integrity for tracks >5 minutes.
72
+ - �️ **Control Granularity**: Introducing precise controls for tempo, instrumentation, and dynamics.
73
 
74
+ ## Ethical Considerations
75
 
76
  GhostAI commits to responsible AI:
77
  - ✅ Ensure originality of generated works
78
+ - Disclose AI involvement in outputs
79
+ - Respect cultural nuances and intellectual property
80
+ - Prohibit harmful or unethical content generation
81
 
82
+ ## Model Details
83
 
84
  **Developed by:** *GhostAI*
85
  **Model type:** Diffusion-based music generation with transformer conditioning
86
  **License:** Apache 2.0
87
  **Resources:**
88
+ - [Project Page](https://ghostai.github.io/GHOSTSONA) *(Coming Soon)*
89
+ - [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
90
+ - [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
91
 
92
+ ## Citation
93
 
94
  ```bibtex
95
  @misc{ghostai2025phantomstep,
 
98
  howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
99
  year={2025},
100
  note={Hugging Face repository}
101
+ }
102
+ ```
103
+
104
+ ## � Acknowledgements
105
+
106
+ Built on the shoulders of ACE Studio and StepFun. *GhostAI* takes it to the **next level**. �
phantomstep_dcae/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "deep_compression_autoencoder",
3
+ "name": "PhantomStep DCAE",
4
+ "description": "Deep Compression AutoEncoder for PhantomStep by GhostAI",
5
+ "compression_ratio": "f8c8",
6
+ "input_channels": 1,
7
+ "output_channels": 1,
8
+ "latent_dim": 128
9
+ }
phantomstep_dcae/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:b3738174f266c3c74ecfecae114358ea2bd581e54b414d8a6203767fdf4c9f65
3
+ size 134
phantomstep_transformer/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "diffusion_transformer",
3
+ "name": "PhantomStep Transformer",
4
+ "description": "Transformer backbone for PhantomStep by GhostAI",
5
+ "architecture": "lightweight_linear_transformer",
6
+ "input_dim": 512,
7
+ "output_dim": 512,
8
+ "num_layers": 12,
9
+ "num_heads": 8
10
+ }
phantomstep_transformer/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1491c6aab4ea255409d092c05171608352fbea6a835a210c5922a8cca498dff5
3
+ size 135
phantomstep_vocoder/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "vocoder",
3
+ "name": "PhantomStep Vocoder",
4
+ "description": "Vocoder for audio synthesis in PhantomStep by GhostAI",
5
+ "sample_rate": 22050,
6
+ "input_dim": 128,
7
+ "output_dim": 1
8
+ }
phantomstep_vocoder/diffusion_pytorch_model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8e66bc7174df4a69a6044f3c4b1a1a19586716e132b92aab7b4e061576db74c8
3
+ size 134
umt5-base/config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "model_type": "umt5",
3
+ "name": "UMT5 for PhantomStep",
4
+ "description": "Text embedding model for PhantomStep by GhostAI",
5
+ "vocab_size": 250112,
6
+ "d_model": 512,
7
+ "num_layers": 12,
8
+ "num_heads": 8
9
+ }
umt5-base/model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:e944764d8d1c700b188ad0656f409deb8fec04588d2bc74445c2a0618a925c80
3
+ size 135
umt5-base/special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": "<s>",
3
+ "eos_token": "</s>",
4
+ "unk_token": "<unk>",
5
+ "pad_token": "<pad>",
6
+ "additional_special_tokens": ["[PhantomStep]"]
7
+ }
umt5-base/tokenizer.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "version": "1.0",
3
+ "model": "UMT5 for PhantomStep",
4
+ "description": "Tokenizer vocabulary for PhantomStep by GhostAI",
5
+ "vocab": {
6
+ "<s>": 0,
7
+ "</s>": 1,
8
+ "<unk>": 2,
9
+ "<pad>": 3,
10
+ "[PhantomStep]": 250112,
11
+ "...": "..."
12
+ },
13
+ "added_tokens": [
14
+ {
15
+ "id": 250112,
16
+ "content": "[PhantomStep]",
17
+ "special": true
18
+ }
19
+ ]
20
+ }
umt5-base/tokenizer_config.json ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "tokenizer_class": "T5Tokenizer",
3
+ "model_max_length": 512,
4
+ "name": "UMT5 Tokenizer for PhantomStep",
5
+ "description": "Tokenizer for text processing in PhantomStep by GhostAI"
6
+ }