Upload 13 files
Browse filesFork build Beta needs alot work
- .gitattributes +1 -1
- README.md +46 -46
- phantomstep_dcae/config.json +9 -0
- phantomstep_dcae/diffusion_pytorch_model.safetensors +3 -0
- phantomstep_transformer/config.json +10 -0
- phantomstep_transformer/diffusion_pytorch_model.safetensors +3 -0
- phantomstep_vocoder/config.json +8 -0
- phantomstep_vocoder/diffusion_pytorch_model.safetensors +3 -0
- umt5-base/config.json +9 -0
- umt5-base/model.safetensors +3 -0
- umt5-base/special_tokens_map.json +7 -0
- umt5-base/tokenizer.json +20 -0
- umt5-base/tokenizer_config.json +6 -0
.gitattributes
CHANGED
@@ -27,4 +27,4 @@
|
|
27 |
*.xz filter=lfs diff=lfs merge=lfs -text
|
28 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
29 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
30 |
-
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
27 |
*.xz filter=lfs diff=lfs merge=lfs -text
|
28 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
29 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
30 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,100 +1,95 @@
|
|
1 |
-
---
|
2 |
-
license: mit
|
3 |
-
---
|
4 |
-
license: apache-2.0 tags:
|
5 |
-
|
6 |
---
|
7 |
license: apache-2.0
|
8 |
tags:
|
9 |
-
- music
|
10 |
-
- text2music
|
11 |
-
- audio-generation
|
12 |
pipeline_tag: text-to-audio
|
13 |
library_name: diffusers
|
14 |
language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
|
15 |
---
|
16 |
|
17 |
-
# PhantomStep: The Ultimate Music Generation Foundation Model
|
18 |
|
19 |

|
20 |
|
21 |
-
##
|
22 |
|
23 |
-
**PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust.
|
24 |
|
25 |
**Key Features:**
|
26 |
-
-
|
27 |
-
-
|
28 |
-
-
|
29 |
-
-
|
30 |
-
-
|
31 |
|
32 |
-
##
|
33 |
|
34 |
### Direct Use
|
35 |
PhantomStep empowers creators to:
|
36 |
- ✨ Craft original music from natural language prompts
|
37 |
-
-
|
38 |
- ✍️ Edit lyrics and vocals with precision
|
39 |
|
40 |
### Downstream Use
|
41 |
A foundation for innovation:
|
42 |
-
-
|
43 |
-
-
|
44 |
-
-
|
45 |
-
-
|
46 |
|
47 |
### Out-of-Scope Use
|
48 |
PhantomStep must **not** be used for:
|
49 |
-
-
|
50 |
- ⛔ Generating harmful or offensive content
|
51 |
-
-
|
52 |
|
53 |
-
##
|
54 |
|
55 |
Dive into the code and demos:
|
56 |
-
-
|
57 |
-
-
|
58 |
|
59 |
## ⚡ Hardware Performance
|
60 |
|
61 |
| Device | 27 Steps | 60 Steps |
|
62 |
|---------------|----------|----------|
|
63 |
| NVIDIA A100 | **30.50x** ⚡ | **14.10x** ⚡ |
|
64 |
-
| RTX 4090 | **38.20x**
|
65 |
-
| RTX 3090 | **15.30x**
|
66 |
-
| M2 Max | **3.15x**
|
67 |
|
68 |
*RTF (Real-Time Factor) shown - higher values indicate faster generation*
|
69 |
|
70 |
-
##
|
71 |
|
72 |
PhantomStep is actively addressing the following limitations:
|
73 |
-
-
|
74 |
-
-
|
75 |
-
-
|
76 |
-
-
|
77 |
-
-
|
78 |
|
79 |
-
##
|
80 |
|
81 |
GhostAI commits to responsible AI:
|
82 |
- ✅ Ensure originality of generated works
|
83 |
-
-
|
84 |
-
-
|
85 |
-
-
|
86 |
|
87 |
-
##
|
88 |
|
89 |
**Developed by:** *GhostAI*
|
90 |
**Model type:** Diffusion-based music generation with transformer conditioning
|
91 |
**License:** Apache 2.0
|
92 |
**Resources:**
|
93 |
-
-
|
94 |
-
-
|
95 |
-
-
|
96 |
|
97 |
-
##
|
98 |
|
99 |
```bibtex
|
100 |
@misc{ghostai2025phantomstep,
|
@@ -103,4 +98,9 @@ GhostAI commits to responsible AI:
|
|
103 |
howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
|
104 |
year={2025},
|
105 |
note={Hugging Face repository}
|
106 |
-
}
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
license: apache-2.0
|
3 |
tags:
|
4 |
+
- music �
|
5 |
+
- text2music �
|
6 |
+
- audio-generation �
|
7 |
pipeline_tag: text-to-audio
|
8 |
library_name: diffusers
|
9 |
language: [en, zh, de, fr, es, it, pt, pl, tr, ru, cs, nl, ar, ja, hu, ko, hi]
|
10 |
---
|
11 |
|
12 |
+
# PhantomStep: The Ultimate Music Generation Foundation Model �
|
13 |
|
14 |

|
15 |
|
16 |
+
## � Model Description
|
17 |
|
18 |
+
**PhantomStep**, crafted by *GhostAI*, is the *pinnacle* of open-source music generation. Building on the foundation of ACE-Step, **PhantomStep** redefines excellence with a reengineered **diffusion-based architecture**, GhostAI's proprietary **Spectral Compression AutoEncoder (SCAE)**, and an optimized **transformer backbone**. Our model delivers **unparalleled generation speed**, **musical coherence**, and **creative control**, leaving competitors in the dust. �
|
19 |
|
20 |
**Key Features:**
|
21 |
+
- � **20× faster** than LLM-based baselines (15s for 4-minute tracks on A100)
|
22 |
+
- � Flawless coherence in melody, harmony, and rhythm
|
23 |
+
- � Full-song generation with precise duration control
|
24 |
+
- � Multilingual text-to-music with enhanced vocal synthesis
|
25 |
+
- � *Upcoming*: Fine-grained style control and genre-specific optimizations
|
26 |
|
27 |
+
## � Uses
|
28 |
|
29 |
### Direct Use
|
30 |
PhantomStep empowers creators to:
|
31 |
- ✨ Craft original music from natural language prompts
|
32 |
+
- � Remix tracks with seamless style transfers
|
33 |
- ✍️ Edit lyrics and vocals with precision
|
34 |
|
35 |
### Downstream Use
|
36 |
A foundation for innovation:
|
37 |
+
- �️ Advanced voice cloning
|
38 |
+
- � Genre-specific music generators (e.g., trap, classical, K-pop)
|
39 |
+
- �️ Professional music production suites
|
40 |
+
- � AI-driven creative assistants
|
41 |
|
42 |
### Out-of-Scope Use
|
43 |
PhantomStep must **not** be used for:
|
44 |
+
- � Unauthorized reproduction of copyrighted material
|
45 |
- ⛔ Generating harmful or offensive content
|
46 |
+
- �️♂️ Misrepresenting AI-generated works as human creations
|
47 |
|
48 |
+
## � How to Get Started
|
49 |
|
50 |
Dive into the code and demos:
|
51 |
+
- � [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
|
52 |
+
- � [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
|
53 |
|
54 |
## ⚡ Hardware Performance
|
55 |
|
56 |
| Device | 27 Steps | 60 Steps |
|
57 |
|---------------|----------|----------|
|
58 |
| NVIDIA A100 | **30.50x** ⚡ | **14.10x** ⚡ |
|
59 |
+
| RTX 4090 | **38.20x** � | **17.85x** � |
|
60 |
+
| RTX 3090 | **15.30x** � | **8.12x** � |
|
61 |
+
| M2 Max | **3.15x** � | **1.45x** � |
|
62 |
|
63 |
*RTF (Real-Time Factor) shown - higher values indicate faster generation*
|
64 |
|
65 |
+
## �️ Optimizations in Progress
|
66 |
|
67 |
PhantomStep is actively addressing the following limitations:
|
68 |
+
- � **Output Consistency**: Reducing "gacha-style" variability with stabilized random seeds and adaptive sampling.
|
69 |
+
- � **Genre Performance**: Enhanced training for niche genres (e.g., Chinese rap, avant-garde jazz).
|
70 |
+
- � **Vocal Quality**: Refined vocal synthesis for natural, expressive outputs.
|
71 |
+
- � **Long-Form Coherence**: Improved structural integrity for tracks >5 minutes.
|
72 |
+
- �️ **Control Granularity**: Introducing precise controls for tempo, instrumentation, and dynamics.
|
73 |
|
74 |
+
## � Ethical Considerations
|
75 |
|
76 |
GhostAI commits to responsible AI:
|
77 |
- ✅ Ensure originality of generated works
|
78 |
+
- � Disclose AI involvement in outputs
|
79 |
+
- � Respect cultural nuances and intellectual property
|
80 |
+
- � Prohibit harmful or unethical content generation
|
81 |
|
82 |
+
## � Model Details
|
83 |
|
84 |
**Developed by:** *GhostAI*
|
85 |
**Model type:** Diffusion-based music generation with transformer conditioning
|
86 |
**License:** Apache 2.0
|
87 |
**Resources:**
|
88 |
+
- � [Project Page](https://ghostai.github.io/GHOSTSONA) *(Coming Soon)*
|
89 |
+
- � [Hugging Face Repository](https://huggingface.co/ghostai1/GHOSTSONA)
|
90 |
+
- � [Demo Space](https://huggingface.co/spaces/ghostai1/GHOSTSONA) *(Coming Soon)*
|
91 |
|
92 |
+
## � Citation
|
93 |
|
94 |
```bibtex
|
95 |
@misc{ghostai2025phantomstep,
|
|
|
98 |
howpublished={\url{https://huggingface.co/ghostai1/GHOSTSONA}},
|
99 |
year={2025},
|
100 |
note={Hugging Face repository}
|
101 |
+
}
|
102 |
+
```
|
103 |
+
|
104 |
+
## � Acknowledgements
|
105 |
+
|
106 |
+
Built on the shoulders of ACE Studio and StepFun. *GhostAI* takes it to the **next level**. �
|
phantomstep_dcae/config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "deep_compression_autoencoder",
|
3 |
+
"name": "PhantomStep DCAE",
|
4 |
+
"description": "Deep Compression AutoEncoder for PhantomStep by GhostAI",
|
5 |
+
"compression_ratio": "f8c8",
|
6 |
+
"input_channels": 1,
|
7 |
+
"output_channels": 1,
|
8 |
+
"latent_dim": 128
|
9 |
+
}
|
phantomstep_dcae/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b3738174f266c3c74ecfecae114358ea2bd581e54b414d8a6203767fdf4c9f65
|
3 |
+
size 134
|
phantomstep_transformer/config.json
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "diffusion_transformer",
|
3 |
+
"name": "PhantomStep Transformer",
|
4 |
+
"description": "Transformer backbone for PhantomStep by GhostAI",
|
5 |
+
"architecture": "lightweight_linear_transformer",
|
6 |
+
"input_dim": 512,
|
7 |
+
"output_dim": 512,
|
8 |
+
"num_layers": 12,
|
9 |
+
"num_heads": 8
|
10 |
+
}
|
phantomstep_transformer/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:1491c6aab4ea255409d092c05171608352fbea6a835a210c5922a8cca498dff5
|
3 |
+
size 135
|
phantomstep_vocoder/config.json
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "vocoder",
|
3 |
+
"name": "PhantomStep Vocoder",
|
4 |
+
"description": "Vocoder for audio synthesis in PhantomStep by GhostAI",
|
5 |
+
"sample_rate": 22050,
|
6 |
+
"input_dim": 128,
|
7 |
+
"output_dim": 1
|
8 |
+
}
|
phantomstep_vocoder/diffusion_pytorch_model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:8e66bc7174df4a69a6044f3c4b1a1a19586716e132b92aab7b4e061576db74c8
|
3 |
+
size 134
|
umt5-base/config.json
ADDED
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"model_type": "umt5",
|
3 |
+
"name": "UMT5 for PhantomStep",
|
4 |
+
"description": "Text embedding model for PhantomStep by GhostAI",
|
5 |
+
"vocab_size": 250112,
|
6 |
+
"d_model": 512,
|
7 |
+
"num_layers": 12,
|
8 |
+
"num_heads": 8
|
9 |
+
}
|
umt5-base/model.safetensors
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:e944764d8d1c700b188ad0656f409deb8fec04588d2bc74445c2a0618a925c80
|
3 |
+
size 135
|
umt5-base/special_tokens_map.json
ADDED
@@ -0,0 +1,7 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"bos_token": "<s>",
|
3 |
+
"eos_token": "</s>",
|
4 |
+
"unk_token": "<unk>",
|
5 |
+
"pad_token": "<pad>",
|
6 |
+
"additional_special_tokens": ["[PhantomStep]"]
|
7 |
+
}
|
umt5-base/tokenizer.json
ADDED
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"version": "1.0",
|
3 |
+
"model": "UMT5 for PhantomStep",
|
4 |
+
"description": "Tokenizer vocabulary for PhantomStep by GhostAI",
|
5 |
+
"vocab": {
|
6 |
+
"<s>": 0,
|
7 |
+
"</s>": 1,
|
8 |
+
"<unk>": 2,
|
9 |
+
"<pad>": 3,
|
10 |
+
"[PhantomStep]": 250112,
|
11 |
+
"...": "..."
|
12 |
+
},
|
13 |
+
"added_tokens": [
|
14 |
+
{
|
15 |
+
"id": 250112,
|
16 |
+
"content": "[PhantomStep]",
|
17 |
+
"special": true
|
18 |
+
}
|
19 |
+
]
|
20 |
+
}
|
umt5-base/tokenizer_config.json
ADDED
@@ -0,0 +1,6 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
{
|
2 |
+
"tokenizer_class": "T5Tokenizer",
|
3 |
+
"model_max_length": 512,
|
4 |
+
"name": "UMT5 Tokenizer for PhantomStep",
|
5 |
+
"description": "Tokenizer for text processing in PhantomStep by GhostAI"
|
6 |
+
}
|