★ Welcome! Two computational musicology projects! ★ ZUN's 379 original tracks analyzed! ★ 89.5% doujin circle classification! ★
Project 1: ZUN Original Soundtrack Analysis
♦ What Is This? ♦

Computational analysis of ZUN's 379 original compositions across 19 Touhou games (TH01-TH19). Extracted 110+ audio features per track to empirically measure compositional evolution, game atmospheres, and stage vs boss theme differences.

379
ZUN Tracks
19
Games Analyzed
110+
Audio Features
♦ Era Evolution (20 Years of ZUN) ♦
EraGamesTempoCharacter
PC-98TH01-05~150 BPMBright, dense FM synthesis
Early WindowsTH06-09~150 BPMClassic sound, MIDI origins
Mid WindowsTH10-14~140 BPMMaturing, darker
Late WindowsTH15+~130 BPMModern, melancholic

Key insight: ZUN's music has gotten slower and moodier over 20 years.

♦ Stage vs Boss Themes ♦
FeatureStageBossInterpretation
Tempo138 BPM125 BPMStage drives forward
Spectral Centroid2503 Hz2705 HzBoss is brighter/piercing
Onset Rate3.55/s2.84/sStage is busier

Key insight: Boss themes emphasize weight over speed.

♦ Interactive Demo ♦
DemoDescriptionLink
Track ExplorerUMAP visualization of all 379 ZUN tracks, colored by era/gameOpen →
✧・゚: *✧・゚:* *:・゚✧*:・゚✧
Project 2: Doujin Circle Style Classifier
♦ What Is This? ♦

Machine learning classifier that identifies which doujin circle (fan arrangement group) created a Touhou arrangement based on audio features. Trained on 954 tracks from 5 major circles. These are fan-made arrangements, not ZUN's original compositions.

89.5%
Classification Accuracy
5
Doujin Circles
954
Arrangement Tracks
♦ Target Circles ♦
CircleStyleTracksAccuracy
UNDEAD CORPORATIONDeath metal6395%
暁RecordsRock, vocal28180%
Liz TriangleAcoustic, folk8475%
IOSYSElectronic, denpa32470%
SOUND HOLICEurobeat, trance20260%
♦ Embeddings Experiment ♦

Handcrafted features vs pretrained neural embeddings:

MethodAccuracyDimsTime/Sample
Handcrafted76.0%4312.28s
CLAP (pretrained)57.0%5120.14s
MERT (music-specific)52.0%7685.43s

Key insight: Domain-specific feature engineering beats transfer learning for niche music classification.

♦ What Are "Handcrafted Features"? ♦

Instead of using neural network embeddings, we extract 431 interpretable audio measurements using signal processing (librosa):

Feature TypeWhat It MeasuresDims
Mel SpectrogramEnergy across 128 frequency bands (mean, std per band)256
MFCCsTimbral texture - 20 coefficients + deltas (rate of change)60
ChromaPitch class distribution (C, C#, D... B) - harmonic content12
Spectral ContrastPeak vs valley energy in 7 frequency bands7
Spectral StatsCentroid (brightness), bandwidth, rolloff, flatness16
TempoBPM estimate1

Why this works better: UNDEAD CORPORATION's death metal has distinctive low spectral centroid + high contrast. IOSYS's denpa has fast tempo + bright timbre. These patterns are directly measurable, while pretrained models weren't trained on Touhou arrangements.

♦ Interactive Demo ♦
DemoDescriptionLink
Circle ClassifierUpload a Touhou arrangement → predict which circle made itOpen →
✧・゚: *✧・゚:* *:・゚✧*:・゚✧
Bonus: Diffusion Model Experiments
♦ Learning Journey ♦

As a learning exercise, I implemented DDPM (Denoising Diffusion Probabilistic Models) from scratch to understand generative modeling. Trained on mel spectrograms from the doujin circle dataset. This is educational/experimental work, not production-ready generation.

500
Epochs Trained
2,832
Mel Spectrograms
5.5h
Training Time (M2)
♦ Implementation Details ♦
ComponentImplementation
Noise ScheduleLinear and cosine β schedules (1000 timesteps)
ArchitectureU-Net with skip connections, GroupNorm, sinusoidal time embeddings
SamplingDDPM (1000 steps) and DDIM (50 steps, deterministic)
ConditioningClass-conditioned with classifier-free guidance (CFG scale 3.0)
♦ Forward Process Visualization ♦
Diffusion forward process - adding noise over timesteps

Forward process: Clean spectrogram → progressively noisier → pure noise (t=1000)

♦ Generated Samples (Epoch 500) ♦
Generated mel spectrograms after 500 epochs

Class-conditioned generation: Each row is a different doujin circle

♦ Training Loss ♦
Training loss curve over 500 epochs

MSE loss on predicted noise. Converged around epoch 300.

♦ What I Learned ♦
  • Forward process math: q(x_t | x_0) lets you jump to any timestep directly
  • Reparameterization: Predicting noise ε instead of x_0 stabilizes training
  • CFG tradeoff: Higher guidance = more class-coherent but less diverse
  • DDIM acceleration: Deterministic sampling enables 20x fewer steps
  • Spectrograms are hard: High-frequency details need more capacity than toy datasets

Code available in scripts/experiment_diffusion_simple.py