Architecture

Rubydex indexes Ruby codebases in two distinct stages: Discovery and Resolution. Understanding this separation is crucial for working with the codebase.

Core Concepts: Definition vs Declaration

A Definition represents a single source-level construct found at a specific location in the code. It captures exactly what the parser sees without making assumptions about runtime behavior.

A Declaration represents the global semantic concept of a name, combining all definitions that contribute to the same fully qualified name. Declarations are produced during resolution.

Consider this example:

# foo.rb
module Foo
  class Bar; end
end

# other_foos.rb
class Foo::Bar; end
class Foo::Bar; end

Definitions (4 total - what the indexer discovers):

  1. Module definition for Foo in foo.rb

  2. Class definition for Bar (nested inside Foo) in foo.rb

  3. Class definition for Foo::Bar in other_foos.rb

  4. Class definition for Foo::Bar in other_foos.rb

Declarations (2 total - what resolution produces):

  1. Foo - A module that has a constant Bar under its namespace

  2. Foo::Bar - A class, composed of definitions 2, 3, and 4

Two-Stage Indexing Pipeline

Stage 1: Discovery

Discovery walks the AST and extracts definitions from source code. It captures only what is explicitly written, making no assumptions about runtime behavior.

What Discovery does:

What Discovery does NOT do:

Why No Assumptions During Discovery?

Consider this example:

module Bar; end

class Foo
  class Bar::Baz; end
end

Without resolving constant references, it may appear that Bar::Baz is created under Foo. But it’s actually not - Bar resolves to the top-level Bar, so the class is Bar::Baz, not Foo::Bar::Baz.

Discovery cannot know this without first resolving Bar. This is why fully qualified names and semantic membership are computed during Resolution, not Discovery.

Stage 2: Resolution

Resolution combines the discovered definitions to build a semantic understanding of the codebase.

What Resolution does:

Graph Structure

Rubydex represents the codebase as a graph, where entities are nodes and relationships are edges. The visualization below shows the conceptual structure (implemented as an adjacency list using IDs).

Open in Excalidraw

Graph visualization

Key Files

ID Types

Connections between nodes use hashed IDs defined in ids.rs:

MCP Server

The rubydex-mcp crate exposes rubydex’s code intelligence as MCP tools over stdio JSON-RPC. The server indexes the codebase on startup, then serves tool requests against the immutable graph.

Pagination

Tools that may return a high number of results accept offset and limit parameters and return a total count to support pagination.

Pagination uses a two-pass approach: first collect all entries that pass filtering into a Vec, then apply skip(offset).take(limit). This ensures total accurately reflects the number of results the caller can page through.

Result Ordering

All collection-returning tools iterate over IdentityHashMap or IdentityHashSet structures. These use a deterministic hasher, so iteration order is fixed for a given map state.

Key Files

FFI Layer

The Rust crate exposes a C-compatible FFI API through rubydex-sys. The C extension in ext/rubydex/ wraps this API for Ruby.

Naming Conventions