Semantic World Generation
Project Overview
Semantic World Generation is my thesis project for my bachelors degree in Creative Computing, which focuses on Game Development. It explores how Large Language Models and embeddings can steer procedural content. A two-stage pipeline pre-generates a world in Python from a small set of human-provided semantic labels (e.g., “coastal dunes,” “alpine valley,” “marsh”), then imports a reproducible package into Unreal Engine for real-time exploration and user studies. Terrain is shaped using fractal Brownian motion and OpenSimplex noise, while scatter algorithms distribute props according to biome-specific parameters, which were all defined by LLMs.
Scope & Role: Solo R&D and implementation. Technologies: Python, Trellis, Unreal Engine C++, GLB/glTF, OpenAI embeddings.
Pipeline at a Glance
- Semantic Input → Grid: User types 3 labels; the system expands them into a 3×3 chunk grid with neighbors chosen via embedding similarity.
- Chunk Profiles: Each chunk gets numeric terrain parameters (height scale, moisture, roughness), water level, and style cues.
- Terrain & Props: Deterministic fBM/OpenSimplex heightmaps and scatter algorithms produce prop placements with seed control.
- Asset Synthesis: 2D references → Trellis → textured
.glbmeshes; multiple variants to avoid repetition. - World Package: Manifest + per-chunk JSON,
.npyheightmaps,.glbmeshes → Unreal loads synchronously and places everything.
Skills Demonstrated
| Skill | Description |
|---|---|
| Systems Architecture | Designed a two-stage, file-based pipeline with reproducible seeds and strict interfaces between Python and Unreal. |
| Procedural Generation | Implemented fBM/OpenSimplex terrain, parametric chunk profiles, and moisture-aware prop scatter. |
| LLM & Embeddings | Used curated wordsets + transformer similarity to expand and validate semantic labels for coherent chunk adjacency. |
| 3D Asset Flow | Automated 2D→3D synthesis via Trellis into GLB, with caching and variant selection. |
| Unreal Engine C++ | Procedural mesh build from .npy heightmaps, glTFRuntime ingestion, deterministic prop spawning, and in-game metadata UI. |
Core Features
- Label-Driven Worlds: High-level text labels steer style and structure without hand-authored levels.
- Deterministic Artifacts: Seeds and manifests ensure the same inputs produce the same world.
- Variant-Rich Props: Multiple mesh variations per placement to reduce visual monotony.
- Water & Biome Logic: Per-chunk waterlines and biome parameters shape terrain and prop densities.
- Playable Import: One-click load builds terrain, places meshes, and attaches inspection widgets.
Technical Concepts and Tools
Semantic Expansion
Labels are embedded and matched against a curated vocabulary; transformer similarity guides coherent neighbors and filters out semantically distant options.
Chunk Profiles & Terrain
Numeric profiles (e.g., height_scale, roughness, moisture) parameterize noise generators and scatter.
Profiles serialize to JSON for auditability and reproducibility.
Trellis Mesh Synthesis
2D reference images turn into textured .glb assets. The importer caches GLBs and assigns variants per placement.
Unreal Integration
ProceduralMeshComponent builds terrain from .npy heightmaps; glTFRuntime loads GLBs; a
GeneratedAssetInfoComponent stores metadata for on-screen inspection and user studies.
Gallery
Learning Takeaways
Pairing semantics with deterministic generation yields worlds that feel intentional yet remain reproducible. Curated vocabularies and similarity filtering were crucial to avoid incoherent chunk borders, and variant meshes substantially improved perceived richness. The file-based bridge kept Python and Unreal decoupled, making iteration fast.
Evaluation & Results
In a small user study, participants reported high believability and immersion, with slightly lower coherence at biome seams. Usability scored strongly, and feedback highlighted the value of visible metadata for understanding generator intent. Future work includes smarter seam blending and adaptive scatter near borders.
Tools Used
- Python: Numpy, OpenSimplex; packaging of manifests, profiles, and heightmaps.
- Trellis: 2D → 3D mesh synthesis pipeline.
- Unreal Engine (C++): Procedural terrain, metadata overlays, glTFRuntime ingestion.
- Embeddings / LLMs: Semantic expansion and validation.