The Orbiter
Knowledge Graph

A definitive guide to how Orbiter models people, companies, and relationships in a graph database — from raw data ingestion to AI-powered path finding.

FalkorDB (Redis-based Graph) | 2,143 People | 12 Node Types | 20+ Edge Types | March 2026

1 What is the Knowledge Graph?

The knowledge graph is the core intelligence layer of Orbiter. It maps every person, company, school, and organization into a network of typed, weighted relationships — enabling the platform to find hidden connections between people and surface relevant introductions.

Knowledge Graph Network Visualization
Conceptual visualization of the Orbiter knowledge graph — nodes represent entities (people, companies, schools), edges represent relationships with varying strengths.
"I use a graph database called FalkorDB, which is Redis-based. The reason I chose it is it's 450 to 500 times faster than Neo4j. It runs on GCP, it uses Cypher as its query language — the same as Neo4j — but it's blazingly fast because it's in-memory."
— Mark Pedersen, Orbiter CTO

The Core Idea

Traditional CRMs store contacts as flat records. Orbiter stores them as nodes in a graph, connected by typed edges that carry meaning: "Alice FOUNDED Company X", "Bob ATTENDED Stanford", "Alice and Bob ATTENDED_TOGETHER at Stanford from 2012-2016."

This structure enables Orbiter's signature features — Leverage Loops (find the shortest, strongest path between two people), Discovery (natural language search that translates to graph queries), and Meeting Prep (automatically surface shared connections before a meeting).

2,143
Total People
~1,925
Visible (with Bio)
12
Node Types
20+
Edge Types

Tech Stack

FalkorDB

Redis-based graph database. Cypher query language. Hosted on GCP. Graph name: live. Extremely fast — in-memory operations.

Xano

Backend platform where all enrichment logic runs. Functions process data, resolve edges, and write to FalkorDB via HTTP API calls (Cypher queries over REST).

OpenAI Embeddings

Model: text-embedding-3-small (1536 dimensions). Stored as name_embedding property on Person and Company nodes for semantic search.

2 Architecture Overview

Data flows through a 4-layer pipeline: external sources feed raw JSON into Xano tables, which get processed into relational records, then resolved into graph edges, and finally written to FalkorDB.

Data Pipeline Flow Diagram
The enrichment pipeline: from raw external data sources through processing, edge resolution, and into the FalkorDB knowledge graph.

The 4 Layers

Layer 1
Ingest
Layer 2
Process
Layer 3
Resolve Edges
Layer 4
FalkorDB
Layer 5
Embeddings
Layer 1 — Ingest

Raw Data Collection

External data sources (PDL, Enrich Layer, Fundable) deliver JSON blobs stored in master_person and related tables. Fields like pdl_json, enrichLayer_json hold the raw payloads.

Layer 2 — Process

Normalize & Extract

The process-enrich-layer function (16 sections) parses JSON into relational tables: work_history, education, certification, skill, social_link, etc.

Layer 3 — Resolve Edges

Build Relationships

Six specialized functions resolve relational records into graph edges — matching companies, schools, and organizations to existing graph nodes or creating new ones.

Layer 4 — Graph Write

FalkorDB Cypher

send-cypher and send-cypher-with-embeddings execute Cypher queries against FalkorDB via HTTP, creating/updating nodes and edges.

System Architecture Diagram

┌─────────────────────────────────────────────────────────────────────┐ │ EXTERNAL DATA SOURCES │ │ ┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌──────────┐ │ │ │ PDL │ │ Enrich Layer │ │ Fundable │ │ LinkedIn │ │ │ └────┬─────┘ └──────┬───────┘ └────┬─────┘ └────┬─────┘ │ └─────────┼────────────────┼────────────────┼────────────────┼────────┘ │ │ │ │ ▼ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ XANO BACKEND (Layer 1-2) │ │ │ │ ┌─────────────────────────────────────────────────────────────┐ │ │ │ process-enrich-layer (16 sections) │ │ │ │ │ │ │ │ S1: basic_info S5: social_links S9: volunteering │ │ │ │ S2: work_history S6: certifications S10: honors │ │ │ │ S3: education S7: languages S11: publications │ │ │ │ S4: skills S8: projects S12: patents │ │ │ │ S13: courses │ │ │ │ S14: recommendations S15: websites S16: avatar │ │ │ └──────────────────────────┬──────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌──────────────────────────────────────────────────────────────┐ │ │ │ RELATIONAL TABLES (Xano DB) │ │ │ │ master_person · work_history · education · skill │ │ │ │ certification · social_link · volunteering · honor │ │ │ │ project · publication · language · master_company │ │ │ └──────────────────────────┬───────────────────────────────────┘ │ └─────────────────────────────┼───────────────────────────────────────┘ │ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ EDGE RESOLUTION (Layer 3) │ │ │ │ ┌────────────────┐ ┌──────────────────┐ ┌──────────────────────┐ │ │ │ resolve-edges │ │ resolve-edges │ │ resolve-edges │ │ │ │ -work │ │ -education │ │ -certifications │ │ │ └────────┬───────┘ └────────┬─────────┘ └──────────┬───────────┘ │ │ ┌────────┴───────┐ ┌────────┴─────────┐ ┌──────────┴───────────┐ │ │ │ resolve-edges │ │ resolve-edges │ │ primary-location │ │ │ │ -honor │ │ -volunteering │ │ -nodes-edges │ │ │ └────────┬───────┘ └────────┬─────────┘ └──────────┬───────────┘ │ └───────────┼──────────────────┼──────────────────────┼───────────────┘ │ │ │ ▼ ▼ ▼ ┌─────────────────────────────────────────────────────────────────────┐ │ FALKORDB (Layer 4) │ │ │ │ Graph: "live" │ Host: GCP │ Query: Cypher │ In-Memory │ │ │ │ Nodes: Person · Company · School · VC_Firm · Organization │ │ City · Region · Country · DomainExpertise │ │ SubDomainExpertise · Funding_Round · Angel │ │ │ │ Edges: WORKED_AT · ATTENDED · CERTIFIED_BY · FOUNDED · │ │ INVESTED_IN · BOARD_MEMBER_OF · ATTENDED_TOGETHER · │ │ LOCATED_IN · HAS_EXPERTISE · ... │ │ │ │ Vector: name_embedding (1536-dim, OpenAI text-embedding-3-small)│ └─────────────────────────────────────────────────────────────────────┘

3 Node Types

Every node in the graph shares the base label :Entity plus a specific type label. There are 12 node types, each with distinct properties. The most important are Person and Company.

"Every node gets an Entity label. That's the base. Then it gets its specific label — Person, Company, School, whatever. The node_uuid is the universal identifier. I use 500 words to describe the company or to describe the person — that's the bio_500, and that's what gets embedded."
— Mark Pedersen

Primary Node Types

👤

Person

:Entity:Person

The central node type. Represents any individual in the network.

  • node_uuid — unique ID (matches master_person.id)
  • name — full name
  • bio_500 — AI-generated 500-word biography
  • name_embedding — 1536-dim vector (from bio_500)
  • headline — professional headline
  • current_company — name of current employer
  • current_title — current job title
  • linkedin_url — LinkedIn profile URL
  • avatar_url — profile photo
  • visibilitytrue/false (gate: bio_500)
🏢

Company

:Entity:Company

Business entities — startups, enterprises, employers.

  • node_uuid — unique ID (matches master_company.id)
  • name — company name
  • about_500 — AI-generated 500-word description
  • name_embedding — 1536-dim vector (from about_500)
  • domain — primary web domain
  • industry — industry classification
  • founded_year — year founded
  • staff_count — employee count
  • linkedin_url — company LinkedIn page
  • logo_url — company logo
🎓

School

:Entity:School

Educational institutions — universities, colleges, programs.

  • node_uuid — unique ID
  • name — institution name
  • domain — web domain
  • linkedin_url — LinkedIn page
💰

VC_Firm

:Entity:VC_Firm

Venture capital firms that invest in companies.

  • node_uuid — unique ID
  • name — firm name
  • domain — web domain
👼

Angel

:Entity:Angel

Angel investors (individuals who invest personally).

  • node_uuid — unique ID
  • name — investor name
🏛

Organization

:Entity:Organization

Non-company entities (certifying bodies, nonprofits, boards).

  • node_uuid — unique ID
  • name — org name
  • domain — web domain

Location Nodes (Hierarchy)

Locations form a 3-level hierarchy: City → Region → Country. Each person's primary location creates edges through all three levels.

🏙

City

:Entity:City
  • node_uuid, name
🌎

Region

:Entity:Region
  • node_uuid, name
🌍

Country

:Entity:Country
  • node_uuid, name

Expertise & Funding Nodes

🧠

DomainExpertise

:Entity:DomainExpertise

Top-level expertise categories (e.g., "Artificial Intelligence", "FinTech").

  • node_uuid, name
  • name_embedding — for semantic matching
💡

SubDomainExpertise

:Entity:SubDomainExpertise

Specific skills within a domain (e.g., "NLP" under "AI"). Created dynamically by LLM + vector resolution.

  • node_uuid, name
  • name_embedding — for semantic matching
📈

Funding_Round

:Entity:Funding_Round

Investment rounds (Series A, Seed, etc.) linking investors to companies.

  • node_uuid, name
  • amount — funding amount
  • date — round date
  • round_type — Seed, Series A, etc.

4 Edge Types & Weights

Edges are the soul of the knowledge graph. Every edge has a type (what the relationship is) and a weight (how strong it is). Lower weight = stronger relationship. Weights are critical for path-finding — the algorithm finds paths with the lowest total weight.

Edge Weight Hierarchy Visualization
Relationship strength hierarchy — lower weight means stronger bond. FOUNDED (5) is the strongest relationship; ATTENDED (40) is the weakest direct connection.
"Weight is the inverse of strength. A weight of 5 means you founded a company — that's a very strong bond. A weight of 40 means you just attended the same school. The path finder sums the weights along a path, so it naturally prefers paths through strong relationships."
— Mark Pedersen

Complete Edge Reference

Edge Type Weight From → To Description
FOUNDED 5 Person → Company Person founded the company. Strongest work relationship.
BOARD_MEMBER_OF 8 Person → Company Person serves/served on the board of directors.
WORKED_AT 5–40 Person → Company Employment. Weight varies by seniority (see table below).
STUDIED_UNDER 10 Person → Person Derived: student overlapped with instructor at same school.
LOCATED_IN 10 Person → City/Region/Country Person's primary location. Creates edges to all 3 hierarchy levels.
ADVISOR_TO 12 Person → Company Advisory relationship to a company.
INVESTED_IN 15 VC_Firm/Angel → Funding_Round Investment participation in a funding round.
RAISED 15 Company → Funding_Round Company raised a funding round.
MEMBER_OF 20 Person → Organization Membership in professional orgs, boards, associations.
CERTIFIED_BY 20 Person → Organization Professional certification from an issuing body.
VOLUNTEERED_AT 25 Person → Organization Volunteer work at an organization.
HONORED_BY 25 Person → Organization Award or honor from an organization.
ATTENDED_TOGETHER 35–70 Person → Person Derived: two people attended the same school with date overlap. Weight based on overlap duration.
ATTENDED 40 Person → School Person attended an educational institution.
TAUGHT_AT 40 Person → School Person taught/instructed at an institution.
HAS_EXPERTISE Person → SubDomainExpertise Person has expertise in a subdomain. No fixed weight (path-finding excludes expertise nodes from intermediate hops).
SUBDOMAIN_OF SubDomainExpertise → DomainExpertise Taxonomy: subdomain belongs to parent domain.
LOCATED_IN City → Region → Country Geographic hierarchy (City in Region, Region in Country).

Work Edge Seniority System

WORKED_AT edges have dynamic weights based on the person's seniority at the company. The formula is:
weight = seniority_rank + (is_current ? 0 : 20)

This means a current C-Suite role (weight 5) is much stronger than a past individual contributor role (weight 40). The +20 penalty for past roles ensures current relationships rank higher in path-finding.

Founder/Owner8
C-Suite (CEO, CTO...)5
Partner8
Vice President12
Director15
Manager18
Individual Contributor20
Entry Level / Intern22

Example: A current VP role has weight 12. A past VP role has weight 32 (12+20). The seniority detection function pattern-matches title strings against known role patterns.

Derived Relationships

Some edges aren't extracted from data directly — they're computed from other edges:

ATTENDED_TOGETHER

Computed

When two people have ATTENDED edges to the same School, and their date ranges overlap, an ATTENDED_TOGETHER edge is created directly between them. Weight ranges from 35 (long overlap, 4+ years) to 70 (minimal overlap, <1 year).

STUDIED_UNDER

Computed

When a student's ATTENDED edge overlaps with an instructor's TAUGHT_AT edge at the same school, a STUDIED_UNDER edge (weight 10) is created. This is a strong relationship — having a professor in common is a powerful connection.

Edge Properties

Edges carry metadata beyond their type and weight:

PropertyFound OnDescription
weightAll edgesNumeric strength (lower = stronger)
titleWORKED_ATJob title at the company
seniorityWORKED_ATSeniority level (c_suite, vp, director, etc.)
is_currentWORKED_AT, ATTENDEDBoolean: still active?
start_dateWORKED_AT, ATTENDEDWhen the relationship started
end_dateWORKED_AT, ATTENDEDWhen it ended (null if current)
degreeATTENDEDDegree type (BS, MBA, PhD, etc.)
field_of_studyATTENDEDWhat they studied
roleVOLUNTEERED_AT, MEMBER_OFRole/position within the org
certification_nameCERTIFIED_BYName of the certification
honor_titleHONORED_BYName of the award/honor
overlap_yearsATTENDED_TOGETHERDuration of school date overlap
amountINVESTED_IN, RAISEDInvestment/round amount

5 The Enrichment Pipeline

Enrichment is how raw data becomes graph knowledge. When a person is added to Orbiter, they go through a multi-stage pipeline that extracts, normalizes, resolves, and graphs their data. The entire pipeline runs 9 stages per person.

The 9 Pipeline Stages

1. process-enrich-layer
2. resolve-edges-education
3. resolve-edges-work
4. resolve-edges-certifications
5. resolve-edges-projects-publications
6. resolve-edges-honor
7. resolve-edges-volunteering
8. complete-person-enrich
9. run-base-company-process

Stage 1: process-enrich-layer (16 Sections)

This is the largest and most complex function. It takes the raw JSON blobs from external data sources and extracts them into normalized relational tables. It has 16 distinct sections, each handling a different data type:

  1. 1
    Basic Info
    Extracts name, headline, summary, location, industry from PDL and Enrich Layer JSON. Updates master_person fields.
  2. 2
    Work History
    Parses employment records from JSON. Creates/updates work_history table entries. Deduplicates by company+title.
  3. 3
    Education
    Parses education records. Creates/updates education table entries. Handles degree types, fields of study, dates.
  4. 4
    Skills
    Extracts skill names from JSON arrays. Creates skill records linked to the person.
  5. 5
    Social Links
    Extracts social media URLs (Twitter, GitHub, personal sites). Creates social_link records.
  6. 6
    Certifications
    Professional certifications with issuing organizations, dates. Creates certification records.
  7. 7
    Languages
    Spoken languages with proficiency levels. Creates language records.
  8. 8
    Projects
    Personal and professional projects. Creates project records.
  9. 9
    Volunteering
    Volunteer experience with organizations and roles. Creates volunteering records.
  10. 10
    Honors & Awards
    Awards, recognition, honors from organizations. Creates honor records.
  11. 11
    Publications
    Published articles, papers, books. Creates publication records.
  12. 12
    Patents
    Patent filings and grants. Creates patent records.
  13. 13
    Courses
    Courses taken or taught. Creates course records.
  14. 14
    Recommendations
    Professional recommendations received. Creates recommendation records.
  15. 15
    Websites
    Personal and professional websites. Creates website records.
  16. 16
    Avatar
    Profile photo URL extraction and placeholder detection. Updates master_person_avatar.

Stages 2–7: Edge Resolution

After relational data is created, edge resolution functions match records to existing graph nodes and create Cypher queries to build graph edges. Here's how each one works:

resolve-edges-education

Stage 2

For each education record: matches the school name against existing School nodes (domain match or LinkedIn URL match). If no match found, creates a new School node. Creates ATTENDED edge with degree, field of study, dates. Also computes ATTENDED_TOGETHER edges for overlapping students.

resolve-edges-work

Stage 3

For each work record: matches company name against Company nodes. Detects seniority from title pattern matching. Creates WORKED_AT edge with dynamic weight. Special handling for founders (FOUNDED edge) and board members (BOARD_MEMBER_OF).

resolve-edges-certifications

Stage 4

Matches certification issuers to Organization nodes (domain or LinkedIn). Creates CERTIFIED_BY edges. If the issuer is not in the graph, creates a new Organization node.

resolve-edges-projects-publications

Stage 5

Links projects and publications to associated organizations or companies in the graph.

resolve-edges-honor

Stage 6

Matches honor/award issuers to Organization nodes. Creates HONORED_BY edges. Three-section process: (1) cross-reference with education, (2) domain resolution, (3) LinkedIn resolution, (4) Cypher creation.

resolve-edges-volunteering

Stage 7

Matches volunteer organizations to graph nodes. Creates VOLUNTEERED_AT edges with role information.

Stage 8: complete-person-enrich

The finalization stage. After all edges are resolved, this function:

Stage 9: run-base-company-process

Enriches the person's primary company. Creates/updates the Company node, generates about_500 bio, creates company embedding, processes funding rounds, and links investors (VC_Firm, Angel nodes with INVESTED_IN edges).

6 Vector Embeddings & Semantic Search

Every Person and Company node carries a 1536-dimensional vector embedding generated from their bio text. This enables semantic (meaning-based) search across the entire graph.

Vector Embeddings & Semantic Search Visualization
Vector embeddings transform text into mathematical points in high-dimensional space. Similar people/companies cluster together, enabling "find people like X" searches.
"I embed the bio, not the properties. If you just embed the name and title, you get garbage. But a 500-word bio that captures who this person is, what they've done, their career trajectory — that gives you a rich, meaningful embedding. That's why I call them juicy bios."
— Mark Pedersen

How It Works

Step 1

Generate Bio

AI (Claude) writes a 500-word biography from all the person's extracted data — work history, education, skills, achievements. This is the bio_500.

Step 2

Create Embedding

The bio_500 text is sent to OpenAI's text-embedding-3-small model, which returns a 1536-dimensional float vector.

Step 3

Store on Node

The vector is stored as name_embedding on the FalkorDB node. Cypher's vecf32 type enables efficient vector operations.

Semantic Search via Cypher

The send-cypher-with-embeddings function enables natural language search:

// User searches: "AI startup founders in San Francisco"
// 1. Query text is embedded into a vector
// 2. Cypher uses vector similarity to find matching nodes:

MATCH (p:Person)
WHERE p.visibility = true
WITH p, vec.euclideanDistance(p.name_embedding, $query_vector) AS score
WHERE score < 0.45  // similarity threshold
RETURN p.name, p.headline, p.current_company, score
ORDER BY score ASC
LIMIT 20

Key Details

Modeltext-embedding-3-small (OpenAI)
Dimensions1536
Distance MetricEuclidean (lower = more similar)
Similarity Threshold0.45 (for expertise), varies by use case
Property Namename_embedding on Person and Company nodes
Source Text (Person)bio_500 — 500-word AI-generated biography
Source Text (Company)about_500 — 500-word AI-generated company description
Qdrant (Separate)Used for user documents/emails — NOT part of the knowledge graph

7 Visibility & "Juicy Bios"

Not every person in the database is visible in the graph. Visibility is gated by a single criterion: the person must have a bio_500. This ensures only properly enriched people appear in search results and path-finding.

"Visibility is binary. If you have a juicy bio, you're visible. If you don't, you're invisible. I don't want half-baked people showing up in search results — if we don't know enough about them to write a 500-word bio, we don't know enough about them to recommend them."
— Mark Pedersen

The Visibility Lifecycle

Person Added
visibility: false
Data Enriched
16 sections
Edges Resolved
6 functions
Bio Generated
bio_500 written
Visible!
visibility: true

What Makes a "Juicy Bio"

The bio_500 is not a dry resume summary. It's an AI-generated narrative (up to 500 words) that captures:

This bio serves double duty: it's displayed to users AND it's the source text for the vector embedding. A richer bio = better search results.

Completeness vs Visibility

Visible (~1,925)

bio_500 present

These people appear in search, path-finding, Discovery, Leverage Loops, Meeting Prep. They have passed through the full enrichment pipeline.

Invisible (~218)

no bio_500

These people exist in the database but are excluded from all user-facing features. Usually means enrichment failed or data was insufficient to generate a bio.

8 Path Finding & Leverage Loops

Path finding is the flagship feature of the knowledge graph. Given two people, Orbiter finds the shortest, strongest paths between them — traversing through shared companies, schools, organizations, and mutual connections.

Path Finding Visualization
Path finding discovers how two people are connected through intermediate nodes. The algorithm prefers paths with lower total weight (stronger relationships).

How Path Finding Works

Step 1

Variable-Length Match

Cypher query matches paths of 1–4 hops between the source and target person nodes. Each hop traverses one edge.

Step 2

Filter & Exclude

Intermediate nodes of type DomainExpertise, SubDomainExpertise, City, Region, and Country are excluded — only "meaningful" entities (companies, schools, people) can be waypoints.

Step 3

Rank by Weight

Paths are sorted first by hop count (fewer = better), then by total weight (lower = stronger relationships). The top paths are returned.

The Cypher Pattern

// Find paths between person A and person B
MATCH path = (a:Person {node_uuid: $source_id})
              -[*1..4]-
              (b:Person {node_uuid: $target_id})

// Exclude expertise and location nodes from intermediate hops
WHERE ALL(n IN nodes(path)[1..-1]
      WHERE NOT n:DomainExpertise
      AND NOT n:SubDomainExpertise
      AND NOT n:City
      AND NOT n:Region
      AND NOT n:Country)

// Calculate total path weight
WITH path,
     length(path) AS hops,
     reduce(w = 0, r IN relationships(path) | w + r.weight) AS total_weight

RETURN path, hops, total_weight
ORDER BY hops ASC, total_weight ASC
LIMIT 10

Path Examples

1-Hop Path (Direct Connection)

Alice —WORKED_AT(5)— Acme Corp —FOUNDED(5)— Bob

Total weight: 10. Alice and Bob both have connections to Acme Corp.

2-Hop Path (Through a Person)

Alice —ATTENDED_TOGETHER(40)— Charlie —WORKED_AT(12)— TechCo —FOUNDED(5)— Bob

Total weight: 57. Alice went to school with Charlie, who works at Bob's company.

Leverage Loops

A Leverage Loop is the user-facing feature built on path-finding. When you want to reach someone in your network, Orbiter shows you the strongest paths and suggests who to talk to first. The "loop" refers to the chain of introductions: You → Connector A → Connector B → Target.

9 Expertise System

The expertise system automatically identifies what people are expert in and organizes those expertise areas into a searchable taxonomy. It uses a two-phase pipeline: LLM identification followed by vector resolution.

Two-Phase Pipeline

Phase 1 — LLM Identification

llm-identify-person-expertise

Claude Sonnet 4 analyzes the person's bio_500, work history, skills, and education to identify their expertise domains and subdomains. Returns structured JSON with domain/subdomain pairs.

Example output: [{"domain": "Artificial Intelligence", "subdomain": "Natural Language Processing"}, {"domain": "FinTech", "subdomain": "Payment Infrastructure"}]

Phase 2 — Vector Resolution

resolve-person-expertise

Each LLM-identified subdomain is embedded and compared against existing SubDomainExpertise nodes using cosine similarity (threshold: 0.45). If a close match exists, it reuses that node. If not, it creates a new SubDomainExpertise node.

This prevents taxonomy explosion — "NLP", "Natural Language Processing", and "Computational Linguistics" all resolve to the same node.

The Taxonomy Structure

// Expertise forms a 2-level hierarchy in the graph:

(Person) -[:HAS_EXPERTISE]-> (SubDomainExpertise) -[:SUBDOMAIN_OF]-> (DomainExpertise)

// Example:
(Mark Pedersen) -[:HAS_EXPERTISE]-> (Graph Databases) -[:SUBDOMAIN_OF]-> (Data Engineering)
(Mark Pedersen) -[:HAS_EXPERTISE]-> (Redis)          -[:SUBDOMAIN_OF]-> (Data Engineering)
(Mark Pedersen) -[:HAS_EXPERTISE]-> (Cypher)         -[:SUBDOMAIN_OF]-> (Data Engineering)

Why Vector Resolution Matters

"If I just used exact string matching, I'd end up with 'Machine Learning', 'ML', 'machine learning', 'Machine Learning & AI' as four separate nodes. That's useless. Instead, I embed the expertise name and check if anything within 0.45 distance already exists. If it does, I use that node. This keeps the taxonomy clean."
— Mark Pedersen

10 Discovery & Text-to-Cypher

Discovery is the natural language search interface for the knowledge graph. Users ask questions in plain English, and the system translates them into Cypher queries that run against FalkorDB.

How It Works

User Query
"Find AI founders in NYC"
LLM Translates
to Cypher
FalkorDB Executes
Graph query
Results Returned
Formatted for UI

The LLM (Claude) receives the full graph schema — all node types, edge types, properties — and generates a valid Cypher query from the natural language input. This is different from semantic search (which uses embeddings) — text-to-Cypher enables structural queries:

Semantic Search

Vector-based

"People similar to Elon Musk" — embeds the query, finds nodes with similar vectors. Best for fuzzy, conceptual searches.

Text-to-Cypher

Structure-based

"Founders who attended Stanford and worked at Google" — generates exact graph traversal query. Best for precise, multi-criteria searches.

11 Multi-Tenancy & Data Layers

The graph has multiple layers of data organization — understanding the header system and the master vs personal graph distinction is key.

"There's the master graph — that's the enriched, public knowledge about everybody. Then there's the personal graph, which is your network, your contacts, your relationships. The master graph powers search and discovery. The personal graph powers your leverage loops and meeting prep."
— Mark Pedersen

The Header System

Every Xano API request that touches the graph uses two headers:

X-Data-Source

Value: live

Determines which data environment to use. Currently only "live" is active. During development, "staging" was used for testing.

X-Branch

Value: v1

API versioning. All current endpoints use v1.

Master Graph vs Personal Graph

Master Graph

Shared

The global knowledge graph containing all enriched people, companies, and relationships. Powered by external data sources (PDL, Enrich Layer, Fundable). Used for Discovery and semantic search.

Personal Graph

Per-User

Each user's imported contacts and connections overlaid on top of the master graph. Used for Leverage Loops (find paths through YOUR network) and Meeting Prep (surface YOUR connections to a meeting attendee).

Scale Plan

Current: ~2,143 people enriched. Target: 25,000 contacts + 50,000 enriched from financial/investment data. Mark's plan is to import contacts from all users' LinkedIn exports, plus enrich a large dataset of people from the financial/tech ecosystem using Fundable deal data.

12 Mark's Design Philosophy

The knowledge graph reflects specific architectural opinions from Mark Pedersen. Understanding these helps explain why certain decisions were made.

"Anti-Unstructured Graph"

Mark insists on corroborating evidence before creating edges. No edge is created from a single data source claim — it must be verified or at least cross-referenced. This means fewer false connections but higher trust.

"Bio-Bond Ontology"

Mark's conceptual model: "Bio" = the person info node (identity, properties). "Bond" = the relationship edge (type, strength, decay rate). Every entity is a bio; every connection is a bond.

"Juicy Bios, Not Dry Properties"

Embeddings are generated from rich narrative text, not from concatenated property values. A 500-word story about someone captures nuance that "CEO, AI, Stanford" never could.

"Speed Over Everything"

FalkorDB was chosen over Neo4j specifically for speed (450-500x faster by Mark's benchmarks). In-memory graph with Cypher compatibility. For a real-time product, query latency matters more than feature completeness.

"Source Reliability Hierarchy"

Not all data sources are equal. PDL provides broad coverage, Enrich Layer adds depth, LinkedIn provides first-party data (highest trust), Fundable provides financial/investment data. When sources conflict, the hierarchy determines which wins.

"Identity Resolution Before Graph"

Before any data enters the graph, identity resolution ensures "John Smith at Google" from PDL and "John Smith" from LinkedIn are the same person. The master_person table is the single source of truth for identity.

13 Production Stats & Health

Current production metrics as of March 2026. The QA enrichment pipeline addresses many of the issues identified here.

2,143
Total People
~1,925
Visible
778
Stuck Processing
1,227
Placeholder Avatars

Known Issues in Production Pipeline

778 Stuck Processing Records

People whose enrich_history_person.processing = true but the function crashed mid-execution. These are "zombie" records — they appear to be enriching but are actually stuck forever.

Root cause: No try_catch in complete-person-enrich. When it crashes, processing is never set back to false.

QA fix: Wrapped in try_catch, processing flag always reset.

3 Global Scope Bugs

Certifications, honors, and volunteering functions had unscoped database queries — when enriching Person A, they could modify records belonging to Persons B, C, D, etc.

Root cause: Missing WHERE master_person_id = $input.master_person_id clauses.

QA fix: All queries now scoped to the specific person being enriched.

No Error Isolation

The 16-section process-enrich-layer function had no individual error handling. If section 4 (skills) crashed, sections 5–16 never ran.

QA fix: Each section wrapped in individual try_catch. Errors logged to crash_log, execution continues.

Empty Table Names

3 database calls in run-base-company-process used empty string "" as the table name. These calls silently fail.

QA fix: Correct table names identified and set.

QA Pipeline Results

The QA pipeline was tested across 20 people (180 stages total):

100%
Pass Rate
180/180
Stages Passed
0
Failures
20
People Tested

14 Glossary

Quick reference for terms Mark uses when discussing the knowledge graph.

bio_500 / "Juicy Bio"
AI-generated 500-word biography of a person. Used for display AND as the source text for vector embeddings. The single visibility gate.
about_500
Same as bio_500 but for companies. AI-generated 500-word company description.
name_embedding
1536-dimensional vector stored on Person/Company nodes. Generated from bio_500/about_500 using OpenAI text-embedding-3-small.
node_uuid
Universal identifier for graph nodes. For Person nodes, matches master_person.id. For Company nodes, matches master_company.id.
visibility
Boolean flag on Person nodes. True = person appears in search/discovery. False = hidden. Gate: bio_500 must exist.
FalkorDB
Redis-based in-memory graph database. Uses Cypher query language. Hosted on GCP. Graph name: "live".
Cypher
Graph query language (same as Neo4j). Used to create nodes, edges, and query the graph. All graph operations are Cypher queries sent via HTTP.
send-cypher
Xano function that sends a Cypher query to FalkorDB via HTTP API and returns the result.
send-cypher-with-embeddings
Like send-cypher but also handles vector operations — generates embeddings from text and includes them in the Cypher query.
send-cypher-with-paths
Specialized function for path-finding queries. Returns structured path data with node/edge details for each hop.
Weight
Numeric value on edges representing relationship strength. LOWER = STRONGER. Used by path-finding to prefer strong connections.
Edge Resolution
The process of matching relational records (work history, education) to existing graph nodes and creating typed edges between them.
Leverage Loop
User-facing feature: find the shortest, strongest chain of introductions to reach a target person through your network.
Discovery
Natural language search feature. Text-to-Cypher: user asks in English, LLM translates to graph query.
Bio-Bond
Mark's ontology model. "Bio" = entity identity (person, company). "Bond" = relationship with type, strength, and decay.
PDL (People Data Labs)
External data source providing professional profile data (work, education, skills). Broad coverage, lower depth.
Enrich Layer
External data source providing deeper profile enrichment. Stored as enrichLayer_json on master_person.
Fundable
External data source providing financial/investment data — funding rounds, investors, deal flow.
master_person
The single source of truth table in Xano for person identity. All enrichment data links back to a master_person record.
master_company
The single source of truth table for company identity. Contains domain, industry, staff count, and links to graph node.
enrich_history_person
Tracks enrichment pipeline status per person per data source. The "processing" flag prevents duplicate enrichment runs.
Stuck Processing
When enrich_history_person.processing = true but the function has crashed. The person is "stuck" — won't be re-enriched until manually reset.
Qdrant
Separate vector database used for user documents and emails. NOT part of the knowledge graph — don't confuse with FalkorDB vectors.
X-Data-Source / X-Branch
HTTP headers on Xano API calls. X-Data-Source: "live" (which data environment). X-Branch: "v1" (API version).

Orbiter Knowledge Graph — The Definitive Guide

Generated March 30, 2026 | Research from Krisp meeting transcripts, Xano function source code, and FalkorDB live data

Prepared by Robert Boulos with Claude Code