In This Chapter

  1. Two kinds of edges
  2. Curated relationships
  3. Generated relationships
  4. IGDB integration
  5. Bounded loading

Two Kinds of Edges

The graph has two fundamentally different kinds of connections. Curated relationships are hand-authored. Someone researched the connection, decided it was meaningful, and added it to the dataset. These are high-signal, low-noise edges: influence lineage, spiritual successors, design antithesis, studio exodus.

Generated relationships are inferred from metadata. If two games share a genre, platform, engine, theme, or game mode, the system creates an edge automatically. These are lower-signal individually, but in aggregate they reveal structural patterns that would take years to curate by hand.

Curated

Hand-authored, high confidence. Tells a story.

INFLUENCED_BY, SPIRITUAL_SUCCESSOR, RIVALRY, DESIGN_ANTITHESIS, DEV_TEAM_EXODUS, MOD_ORIGIN, SAME_CREATOR, SHARED_MECHANIC, SHARED_AESTHETIC_OR_TONE, SAME_STUDIO, SAME_PUBLISHER, SAME_SERIES, REMAKE_OR_SPINOFF, SHARED_UNIVERSE, SHARED_ENGINE_OR_TECH

Generated

Inferred from IGDB metadata. Reveals structure at scale.

SAME_GENRE, SAME_PLATFORM, SHARED_THEME, SHARED_KEYWORD, SHARED_MODE, SHARED_PERSPECTIVE, SHARED_CAMERA_MODE, SHARED_MONETIZATION_MODEL, SHARED_PLAYSTYLE, SHARED_LIVE_SERVICE, SHARED_SESSION_LENGTH, SHARED_CONTROL_SCHEME, STUDIO_LINEAGE

Curated Relationships

The most valuable edges in the graph are the ones that tell stories. INFLUENCED_BY is the backbone: Doom influenced Quake. System Shock influenced Deus Ex. Ultima Underworld influenced nearly everything in the immersive sim lineage. These connections create the through-lines that make the graph meaningful, not just pretty.

But influence is only one dimension. DEV_TEAM_EXODUS captures when a group leaves one studio and the DNA carries forward. The people who made GoldenEye left Rare and founded Free Radical Design; their next game, TimeSplitters, carries obvious DNA. DESIGN_ANTITHESIS captures reactive design: games that exist specifically because their creators disagreed with how another game solved a problem.

MOD_ORIGIN traces the lineage from mod to standalone product. Counter-Strike started as a Half-Life mod. DOTA started as a Warcraft III mod. These origin stories are crucial to understanding how the medium evolves, because they represent community-driven evolution rather than studio-driven iteration.

The curation challenge. Influence is subjective. Two knowledgeable people can disagree about whether Game A influenced Game B. The approach here is to prioritize documented connections (developer interviews, postmortems, credits) over editorial opinion. When a connection is debatable, the flag system lets users raise it for review.

Relationship Explorer

Toggle relationship types on and off. With just INFLUENCED_BY, you see a sparse web of meaningful connections. Turn on SAME_STUDIO and Valve's games snap together. Turn on SAME_GENRE and the whole structure densifies. Each layer reveals a different dimension of how games connect.

Generated Relationships

Generated edges solve a different problem: coverage. Hand-curating every meaningful connection between thousands of games would take decades. But IGDB already knows that two games share a genre, platform, engine, or theme. The buildGeneratedRelationships module reads this metadata and creates edges on the fly.

The tricky part is density. If you connect every game that shares a genre, the graph explodes. SAME_GENRE alone would create millions of edges. The system uses relationship LOD (level of detail) to manage this. Dense relationship types are sampled rather than shown in full. When you zoom into a cluster, more connections appear. When you zoom out, only the strongest remain. The experience is like adjusting the resolution of a telescope.

13
Generated types
LOD
Density control
IGDB
Metadata source

IGDB Integration

IGDB (Internet Game Database) provides the metadata foundation. Game titles, release dates, genres, platforms, studios, publishers, engines, themes, game modes, cover art, screenshots. The ingestion pipeline pulls this data through scripted batch jobs, not manual entry.

The pipeline uses bounded, year-by-year ingestion. For each year, it pulls the top-rated games by sales rank plus a seeded random sample. This keeps the dataset representative without requiring a full mirror of IGDB's catalog. The seed ensures deterministic results across runs, so the same script always produces the same dataset.

Deduplication turned out to be the hardest technical problem in the entire project. IGDB sometimes has multiple entries for the same game (different platforms, regional releases, remasters with separate IDs). A game might appear three times with slightly different titles, different cover art, and relationships pointing to different copies. Without dedup, the graph fills with phantom nodes and broken connections.

The audit and merge scripts detect duplicates by igdb_id and consolidate them, preserving relationships and metadata from both records. All merge operations run in dry-run mode by default and log every affected row. Getting this right took more iteration than any rendering or physics feature. Bad data is invisible until someone searches for a game they know and finds it listed twice, or not at all.

Bounded Loading

The dataset is too large to load all at once in the browser. The loading strategy uses Supabase RPCs to fetch a bounded slice: a configurable number of top-selling games per year, plus a random sample per year, for whatever time range the user is viewing. Relationships are fetched only for the loaded game set.

This means the graph you see is always a representative sample, not the full catalog. Zooming into a decade loads more games from that era. The timeline scrubber controls the visible range. The experience feels complete even though you're looking at a carefully chosen subset.

Why RPCs instead of REST? Supabase RPCs let you push complex query logic (bounded per-year selection with seeded random) to the database. A REST endpoint would require fetching all games and filtering client-side, which defeats the purpose of bounded loading. The RPC get_games_limited_per_year does the heavy lifting in Postgres.