ORBITER

Enrichment QA Pipeline — Full Report

Production vs QA Comparison · Data Flow · FalkorDB Integration Prepared by Robert Boulos · March 30, 2026

Confidential

Executive Summary
How the Pipeline Works — End-to-End Architecture
Data Model — What Gets Enriched
FalkorDB Graph Integration
Real Person Examples — What the Pipeline Produces
Critical Bugs Found in Production
Before vs After — Key Metrics
Function-by-Function Comparison (9 Functions)
Test Results — 20-Person Batch
Implementation Phases
Production Pipeline Health (Current State)
Recommendations

1. Executive Summary

Orbiter’s enrichment pipeline ingests person and company data from 5+ external sources (People Data Labs, LinkedIn/Enrich Layer, Y Combinator, Fundable, Crunchbase), normalizes it into 20+ relational tables in Xano, then projects it into a FalkorDB knowledge graph with nodes, edges, and vector embeddings.

We cloned all 9 core enrichment functions into an isolated qa/ namespace, systematically audited each one, and discovered 6 bugs in the production code — including 3 functions with globally-scoped database queries that corrupt data across people. Our QA clones fix all of these and add error isolation, structured responses, and an AI self-healing loop powered by Claude Opus 4.6.

100%

QA Pass Rate

180/180

Stages Passed

Failures

Bugs Fixed

2. How the Pipeline Works — End-to-End Architecture

The enrichment pipeline has 4 layers: Ingest (raw data from external APIs), Process (fan out into relational tables), Resolve (link records to companies), and Graph (project into FalkorDB).

LAYER 1: INGEST (External APIs → JSON blobs) People Data Labs → person_enrich_data.people_data_labs (emails, work, education, skills) LinkedIn/Enrich Layer → person_enrich_data.enrich_layer_data (profile, social, experience) Fundable → person_enrich_data.fundable (investor data, deals, orgs) Y Combinator → company_enrich_data.yc_data (batch, founders, funding) Crunchbase → enrich_history_person (business profiles) LAYER 2: PROCESS (JSON blobs → relational tables) qa/process-enrich-layer (16 sections) ├─ Section 1: Name formatting → master_person (first/last/suffix) ├─ Section 2: Avatar → master_avatar (real image) ├─ Section 3: Biographies → about_person (headline + summary) ├─ Section 4: Location → primary_location (geo-coded) ├─ Section 5: Skills → skills_join → skills ├─ Section 6: Gender → master_person.sex ├─ Section 7: LinkedIn followers → master_link + follower count ├─ Section 8: Languages → language_join → languages ├─ Section 9: Education → education_experience (foreach record) ├─ Section 10: Work experience → work_experience (foreach position) ├─ Section 11: Certifications → certification (foreach cert) ├─ Section 12: Volunteering → volunteering (foreach entry) ├─ Section 13: Projects → project (foreach project) ├─ Section 14: Publications → publication (foreach pub) ├─ Section 15: Honors/Awards → honor (foreach award) └─ Section 16: Interests → interest_join → interest LAYER 3: RESOLVE (relational records → company links) qa/resolve-edges-education Match schools by LinkedIn URL → domain → name qa/resolve-edges-work Match employers by domain → LinkedIn URL qa/resolve-edges-certifications Match issuers by domain → LinkedIn URL qa/resolve-edges-honor Match issuers by domain → LinkedIn URL qa/resolve-edges-volunteering Match organizations by domain → LinkedIn URL qa/resolve-edges-projects-pubs Match associated companies LAYER 4: GRAPH (relational → FalkorDB via Cypher) create-education-edges → (Person)-[:ATTENDED]->(School) → (Person)-[:ATTENDED_TOGETHER]-(Person) [derived] → (Person)-[:STUDIED_UNDER]->(Person) [derived] create-work-edges → (Person)-[:WORKED_AT]->(Company) resolve-edges-honor → (Person)-[:RECEIVED_HONOR]->(Honor)-[:ISSUED_BY]->(Org) complete-person-enrich → update-person-node (set visibility=true, sync embedding) run-base-company-process → update-company-node (funding, industries, staff)

3. Data Model — What Gets Enriched

The pipeline populates 20+ relational tables per person. Here are the core tables and what they store.

Person Data Tables

Table	Records Per Person	Key Fields	Source
master_person	1	name, avatar, linkedin_url, current_title, bio, bio_500, sex, node_uuid, visibility	All sources merged
work_experience	3-15	title, company_name, company_domain, start/end dates, edge_uuid, master_company_id	PDL, Enrich Layer, Fundable
education_experience	2-8	school_name, field_of_study, degree_name, activities, start/end year, edge_uuid	PDL, Enrich Layer
skills_join	20-70	skill_id → skills.skill (text)	PDL, Enrich Layer
certification	0-10	name, org_name, issue_date, edge_id, master_company_id	Enrich Layer
honor	0-5	title, issuer_name, issued_on_year, node_uuid, edge_id	Enrich Layer
volunteering	0-5	title, company_name, start/end year, master_company_id	Enrich Layer
master_link	5-12	service (linkedin, twitter, github...), link_url, profile (bool)	YC, PDL, Enrich Layer
master_email	1-6	email_address, email_type (work/personal), active_status	LinkedIn, PDL
master_avatar	1-5	url, main (bool), is_placeholder (bool)	Twitter, LinkedIn, Fundable
about_person	2-4	biography (text), data_source_id	PDL headline, Enrich Layer, LLM
interest_join	5-15	interested_in_id → interest.interest	Enrich Layer
language_join	1-5	languages_id → languages.language	Enrich Layer

Data Sources (95 registered, 5 primary)

ID	Source	Type	What It Provides
91	People Data Labs (PDL)	API	Emails, phones, education arrays, work arrays, skills, gender
94	Enrich Layer	Aggregated	LinkedIn profile, skills, experience, education, certifications, volunteering, honors, interests, languages
89	Fundable	API	Investor profiles, funding deals, organizations, total raised
87	Y Combinator	Database	Batch info, founders, company one-liners, social links
8	Crunchbase	API	Business profiles, funding rounds

4. FalkorDB Graph Integration

FalkorDB (formerly RedisGraph) is a property graph database that uses the Cypher query language. Every enriched person and company becomes a node with vector embeddings, connected by typed, weighted relationships.

Graph Node Types

Label(s)	Created By	Key Properties
:Entity:Person	add-person-node	uuid, name, name_embedding (vecf32), visibility, avatar, bio, roles, skills, interests
:Entity:Person:Angel	update-person-node (if is_angel)	+ investor_type, exits, investments, board_advisory_experience
:Entity:Company	update-company-node	uuid, name, domain, name_embedding, industries, specialties, employee_range, founded
:Entity:Company:School	update-company-node (.edu domain)	Identified by .edu domain or is_school=true flag
:Entity:Company:VC_Firm	update-company-node (is_vc)	+ investment_range, stages, sweet_spot
:Honor	resolve-edges-honor	uuid, title, description, issued_on_year, issuer_name

Graph Relationship Types

Relationship	Direction	Weight	How It’s Created
ATTENDED	Person → School	40	create-education-edges: groups education records by school, collapses into one edge, generates LLM description via Gemini 2.5 Flash
ATTENDED_TOGETHER	Person ↔ Person	35-70	Derived: Automatically created when two people have overlapping attendance dates at the same school. Weight varies: 35 (same major), 50 (same field), 70 (just temporal overlap)
STUDIED_UNDER	Person → Instructor	10	Derived: Created when a student’s attendance overlaps with an instructor’s TAUGHT_AT at the same school
WORKED_AT / WORKS_AT	Person → Company	varies	create-work-edges: links work_experience to companies, sets current_title on master_person
RECEIVED_HONOR	Person → Honor	20	resolve-edges-honor: creates Honor node + edge in single Cypher transaction
ISSUED_BY	Honor → Organization	25	Same transaction as RECEIVED_HONOR, conditional on issuer existing in graph

Key Architectural Patterns

Relational-first, graph-second: All data lands in Xano tables first. The graph is a projection. This makes the pipeline crash-safe — sections fail independently while the graph stays eventually consistent.
UUID-based bidirectional linking: FalkorDB generates randomUUID() for nodes. This UUID is stored back on the Xano record (node_uuid on master_person, edge_uuid on join tables). Both systems can find each other.
Vector embeddings on every node: Person nodes get name_embedding from bio_500 via OpenAI embeddings. Company nodes from about_500. Enables semantic similarity search in the graph.
LLM-generated edge descriptions: Education edges include a natural-language description generated by Gemini 2.5 Flash (e.g., “Robert studied Computer Science at MIT from 2010-2014”).
Derived relationships: The graph computes connections like ATTENDED_TOGETHER by analyzing temporal overlap between existing edges — surfacing connections not explicitly stated in source data.
Weight-based ranking: Every edge has a weight property. Lower = stronger. STUDIED_UNDER (10) ranks higher than ATTENDED (40). Graph traversal prioritizes meaningful connections.
Visibility gating: Nodes start visibility: false. Only become visible after bio_500 exists. Prevents incomplete data in user-facing queries.

5. Real Person Examples — What the Pipeline Produces

John Traver

Co-Founder / Creative Technologist at Frame.io · ID: 2

node_uuid: 7bd19581-941c-4428-a6da-d0a566a01a31

Bio (AI-generated, 500 words)

“John Traver is Co-founder and Creative Technologist at Frame.io, blending cinematic workflow with cutting-edge code. From scripting MEL and AE at RIT to shipping the 5-star KataData app in Objective-C, he has mastered Python, Ruby, JS, and now Haskell to scale video collaboration for Oscar-winning teams.”

Work Experience (3 records)

Title	Company	Period	Source
Co-Founder & Creative Technologist	Frame.io	2014 – 2025	Fundable
Chief Scientist	Katabatic Digital	2012 – 2014	Enrich Layer
K/Lab Engineering	Katabatic Digital	2010 – 2012	Enrich Layer

Education (5 records)

School	Degree	End	Source
Rochester Institute of Technology	BS	2014	PDL
Rochester Institute of Technology	BS	2010	PDL
Rochester Institute of Technology	—	2010	PDL
Shenendehowa High School	—	2006	PDL
Rochester Institute of Technology	—	—	PDL

Skills (36 skills)

Social Profiles (9 links)

Emails (4 verified)

Languages

Interests (12)

FalkorDB Graph Edges

Person node 7bd19581... → ATTENDED → Rochester Institute of Technology (School node) • ATTENDED → Shenendehowa HS • WORKED_AT → Frame.io • WORKED_AT → Katabatic Digital

Jimmy Wales

Founder & CEO at Wikitribune/WT.Social · ID: 14

node_uuid: 68facb3e-cf3f-4035-97f4-7e72e0c5aae0

Work Experience (5 records)

Title	Company	Period
Owner	Fandom	Current
Board Member	Wikimedia Foundation	2003 – present
Executive Chairman	The People’s Operator	2014 – 2015
Founder	Mighty Capital	2001 – 2007
Founder & CEO	Wikitribune	2017 – present

Education (3 records)

School	Degree	End
University of Alabama	M.S. Finance	1991
Auburn University	B.S. Finance	1989
Randolph High School	—	1983

Skills (21)

Enrichment Note

PDL returned not_found for Jimmy Wales — his profile was too high-profile/protected. All data came from Enrich Layer (LinkedIn scrape) instead. The pipeline handled this gracefully.

Jon Dahl

Co-Founder, CEO at Mux · ID: 17

node_uuid: 14818071-5a47-4add-a5fe-8e992004a9c0

Work Experience

Title	Company	Period
Co-Founder, CEO	Mux	Current
VP Engineering	Brightcove	2012 – 2015
Co-Founder, CEO	Zencoder	2010 – 2012

Education

School	Degree
Trinity International University	BA
Wheaton College	—

QA Pipeline Result

Jon Dahl was one of two people who failed in the original production batch test (306s timeout on run-base-company-process). After QA fixes — specifically the direct function.run bypass that eliminates the HTTP hop — he now passes 9/9 in 45s.

Charles Njenga

ID: 20 — Example with rich certifications

Certifications (7+ records)

Certification	Issuer
Advanced React and Redux	Udemy
Modern JavaScript	Udemy
Complete React Developer	Udemy
CSS - The Complete Guide	Udemy
Google Africa Developer Scholarship	Google / Andela
JavaScript Algorithms & Data Structures	Udemy
Modern React with Redux	Udemy

Each certification goes through resolve-edges-certifications which matches the issuer (Udemy, Google) to a master_company record. In production, this function had no input parameter and queried ALL certifications globally — our QA fix scopes it to this person only.

6. Critical Bugs Found in Production Code

Deep code review uncovered 6 bugs. Three are data-corruption-level severity (globally-scoped queries that modify records belonging to other people).

Critical — Data Corruption

Bug #1: Certifications — No Input Parameter

Function resolve-edges-certifications (12719) had NO input parameter at all. All 3 db.query certification calls queried the entire table. When enriching Charles Njenga (ID 20), it would also modify Mario Haarmann’s (ID 18) 3 certifications.

Fix: Added master_person_id input. Added WHERE clause to all 3 queries.

Critical — Data Corruption

Bug #2: Honors — 2 of 4 Sections Unscoped

Function resolve-edges-honor (12715): domain resolution and LinkedIn resolution sections queried ALL 71 honor records globally instead of just the current person’s.

Fix: Added master_person_id WHERE clause to sections 2 and 3.

Critical — Data Corruption

Bug #3: Volunteering — 2 of 3 Sections Unscoped

Function resolve-edges-volunteering (12716): LinkedIn resolution and Cypher creation sections queried ALL 105 volunteering records globally.

Fix: Added master_person_id WHERE clause to sections 2 and 3.

High

Bug #4: Company Process — Empty Table Names

Function run-base-company-process (12720) had 3 database calls with "" as the table name in the staff_count section. Deploys fine but silently fails at runtime.

Fix: Identified correct table name. Replaced all 3 empty strings.

High — Root cause of 764 stuck records

Bug #5: No Error Isolation in Main Processor

process-enrich-layer (12712) has 16 independent sections. A crash in Section 3 (avatar) kills Sections 4-16 (skills, education, work, certs, etc.). Production has 764 enrich_history records stuck with processing=true.

Fix: Each of 16 sections wrapped in individual try_catch. Crashes logged to crash_log, continue to next section.

High

Bug #6: Zombie Processing Records

complete-person-enrich (12713): No try_catch. Crash leaves processing=true forever and queue entry never cleaned up.

Fix: Wrapped in try_catch. Queue cleanup moved outside try block (always executes).

7. Before vs After — Key Metrics

Metric	Production (Before)	QA Pipeline (After)
Pass Rate	99.2% (357/360)	100% (180/180)
Failures	3 (timeout on company process)	0
Stuck Processing Records	778 in production	0 (guaranteed cleanup)
Global Query Bugs	3 functions unscoped	All queries scoped to person
Error Isolation	None (1 crash kills all 16 sections)	Per-section try_catch
Response Format	String: "success"	Structured JSON: {processed, resolved, errors, skipped}
Duplicate History	Creates new record every run	Check-and-reuse existing records
AI Self-Fix	None	Claude Opus 4.6 via OpenRouter with 13 XanoScript rules
Company Process Timeout	HTTP hop timeout at 300s (Xano nginx)	Direct function.run (no HTTP hop)

8. Function-by-Function Comparison

All 9 enrichment functions cloned into qa/ namespace with fixes applied.

1. process-enrich-layer

ID: 12712 · 16 sections

Main data processor: fans out Enrich Layer JSON into 16 relational tables (name, avatar, bio, location, skills, gender, LinkedIn, languages, education, work, certs, volunteering, projects, publications, honors, interests).

BugsNo error isolation — 1 crash kills all 16 sections. No duplicate history prevention. Root cause of 764 stuck records.

FixesPer-section try_catch (16 sections). Duplicate history check-and-reuse. Null guard on person name before name-format_v2.

NewStructured response: {sections_run, sections_ok, sections_skip, errors[]}. Crash logging per section.

2. complete-person-enrich

ID: 12713

Finalization: marks processing complete, updates person node in FalkorDB (visibility=true, sync embedding), cleans up enrichment queue.

BugsNo try_catch. Crash = zombie processing=true record + orphaned queue entry.

FixesWrapped in try_catch. Queue cleanup moved outside try block (always runs).

3. resolve-edges-education

ID: 12714

3-phase resolution: match schools by LinkedIn URL → domain → name. Creates ATTENDED edges in FalkorDB with LLM-generated descriptions. Derives ATTENDED_TOGETHER edges from date overlap.

BugsNone — already properly scoped.

NewStructured response: {processed, resolved, errors, skipped}.

4. resolve-edges-honor

ID: 12715

Links honors to issuing organizations. Creates Honor nodes + RECEIVED_HONOR and ISSUED_BY edges in FalkorDB using Twig-template Cypher.

BugsSections 2 & 3 query ALL 71 honors globally (missing person filter).

FixesAdded master_person_id WHERE clause to both sections. Null guard on node_uuid.

5. resolve-edges-volunteering

ID: 12716

Links volunteering records to organizations via domain/LinkedIn matching.

BugsSections 2 & 3 query ALL 105 volunteering records globally.

FixesAdded master_person_id WHERE clause to both sections.

6. resolve-edges-work

ID: 12717

Links work records to companies. Creates WORKED_AT/WORKS_AT edges. Deduplicates similar titles. Sets best current role on master_person via LLM.

BugsNone — already properly scoped.

NewStructured response: {processed, resolved, errors, skipped}.

7. resolve-edges-projects-publications

ID: 12718

Links projects and publications to associated companies.

BugsNone.

NewStructured response with counts.

8. resolve-edges-certifications

ID: 12719

Links certifications to issuing organizations (e.g., Udemy, Google, iSAQB).

BugsCRITICAL: NO input parameter. All 3 db.query calls operate on entire certifications table for ALL people.

FixesAdded master_person_id input. Scoped all 3 WHERE clauses. Updated test-stage caller.

9. run-base-company-process

ID: 12720

Company enrichment: processes PDL + Enrich Layer + YC data, links Fundable organizations, extracts funding/deals, updates company node in FalkorDB.

Bugs3 db calls use empty table name (""). Timeout on large deal counts. HTTP hop timeout at 300s.

FixesCorrected table names. Direct function.run bypass. Deal count protection (>100 = skip).

9. Test Results — 20-Person Batch

Final verification: 20 people, 9 stages each, 180 total stages. All passed with 0 failures.

People

Stages / Person

180

Total Passed

Failures

327s

Total Time

#	Name	Company	Passed	Time	Status
1	Josh Diamond	Frame.io	9	22s	PASS
2	John Traver	Frame.io	9	22s	PASS
3	Emery Wells	Frame.io	9	3s	PASS
4	Molly Alter	—	9	18s	PASS
5	Jason Diamond	The Diamond Bros.	9	17s	PASS
6	Itai Tsiddon	—	9	33s	PASS
7	Amish Jani	FirstMark Capital	9	42s	PASS
8	Jared Leto	—	9	2s	PASS
9	Mark L. Pederson	Orbiter	9	26s	PASS
10	Kevin Spacey	—	9	5s	PASS
11	Thomas Hesse	—	9	13s	PASS
12	Walter Kortschak	—	9	8s	PASS
13	Jimmy Wales	Wikitribune	9	8s	PASS
14	Larry Sanger	—	9	9s	PASS
15	Clark Valberg	—	9	9s	PASS
16	Jon Dahl	Mux	9	45s	PASS
17	Mario Haarmann	—	9	6s	PASS
18	Vijay Nagappan	—	9	6s	PASS
19	Charles Njenga	—	9	8s	PASS
20	Dynamo Mbugua	—	9	21s	PASS

Average: 16.4s/person. Fastest: Jared Leto (2s). Slowest: Jon Dahl (45s).

Previously failing: Larry Sanger (was 438s timeout → now 9s) and Jon Dahl (was 306s timeout → now 45s).

10. Implementation Phases

Phase 1: Critical Bug Fixes (P0)

Scoped certifications queries (added missing input + 3 WHERE clauses)
Scoped honors queries (2 sections) and volunteering queries (2 sections)
Fixed empty table names in company process (3 db calls)
Added try_catch to complete-person-enrich (zombie prevention)
Added per-section try_catch to process-enrich-layer (16 sections)

Phase 2: Resilience (P1)

Null guards: skip records with empty names, null node_uuids
Timeout protection: skip fundable deals >100 count
Duplicate history prevention: check-and-reuse existing enrich_history_person records
Direct function.run for company process (bypasses Xano nginx 300s timeout)

Phase 3: Observability (P1)

All 9 functions return structured JSON: {processed, resolved, errors, skipped}
Company process returns per-section results array
Crash logging to crash_log table with section name and full error context

Phase 4: AI Self-Fix Loop (P2)

Powered by Claude Opus 4.6 via OpenRouter ($220 credit pool on OPEN_ROUTER_RND key)
13 XanoScript constraint rules in system prompt (db.edit inline data, filter gotchas, etc.)
Fetches function source via Xano Meta API + crash_log context + error details
Pre-flight validation function (qa/validate-enrich-data): checks data shapes before processing

11. Production Pipeline Health (Current State)

Current state of the production enrichment system, measured via the enrichment diagnostics MCP.

2,143

Total People

778

Stuck Processing

218

Invisible (10.2%)

1,227

Placeholder Avatars

Issue	Count	Severity	Root Cause
Stuck enrich_history records (processing=true)	778	HIGH	Bug #5 & #6: no error isolation, no zombie cleanup
People with visibility=false	218 (10.2%)	HIGH	Stuck processing prevents complete-person-enrich from running
Placeholder avatars marked as main	1,227	MEDIUM	Systemic bug in replace-avatar logic (separate from enrichment)
Company queue backlog	10,159	MEDIUM	New companies queued by edge resolvers faster than processed
People missing enrich_data record	5	LOW	Created before person_enrich_data table existed
Crash log entries	29	INFO	Various runtime errors captured by crash_log table

The QA pipeline’s fixes directly address the top two issues: per-section try_catch eliminates stuck records, and guaranteed cleanup in complete-person-enrich prevents zombies.

12. Recommendations

Port QA fixes to production: The 6 bugs exist in live production functions. The QA clones prove the fixes work at 100% pass rate across 20 people.
Clean up 778 stuck records: Run a one-time cleanup to set processing=false on all stuck enrich_history_person records, allowing re-enrichment.
Run QA batch on full dataset: We tested 20 people (IDs 2-21). The database has 2,143. A broader batch will surface edge cases.
Enable AI self-fix in production: The Opus 4.6 loop can diagnose and fix XanoScript errors automatically, reducing manual debugging.
Address company queue backlog (10,159): Edge resolvers create new master_company records faster than they’re processed. May need batch company enrichment or prioritization.
Fix placeholder avatar systemic bug: 1,227 people have placeholder avatars marked as main — this is a separate bug in the replace-avatar function.

ORBITER

Contents

1. Executive Summary

2. How the Pipeline Works — End-to-End Architecture

3. Data Model — What Gets Enriched

Person Data Tables

Data Sources (95 registered, 5 primary)

4. FalkorDB Graph Integration

Graph Node Types

Graph Relationship Types

Key Architectural Patterns

5. Real Person Examples — What the Pipeline Produces

John Traver

Bio (AI-generated, 500 words)

Work Experience (3 records)

Education (5 records)

Skills (36 skills)

Social Profiles (9 links)

Emails (4 verified)

Languages

Interests (12)

FalkorDB Graph Edges

Jimmy Wales

Work Experience (5 records)

Education (3 records)

Skills (21)

Enrichment Note

Jon Dahl

Work Experience

Education

QA Pipeline Result

Charles Njenga

Certifications (7+ records)

6. Critical Bugs Found in Production Code

Bug #1: Certifications — No Input Parameter

Bug #2: Honors — 2 of 4 Sections Unscoped

Bug #3: Volunteering — 2 of 3 Sections Unscoped

Bug #4: Company Process — Empty Table Names

Bug #5: No Error Isolation in Main Processor

Bug #6: Zombie Processing Records

7. Before vs After — Key Metrics

8. Function-by-Function Comparison

1. process-enrich-layer

2. complete-person-enrich

3. resolve-edges-education

4. resolve-edges-honor

5. resolve-edges-volunteering

6. resolve-edges-work

7. resolve-edges-projects-publications

8. resolve-edges-certifications

9. run-base-company-process

9. Test Results — 20-Person Batch

10. Implementation Phases

Phase 1: Critical Bug Fixes (P0)

Phase 2: Resilience (P1)

Phase 3: Observability (P1)

Phase 4: AI Self-Fix Loop (P2)

11. Production Pipeline Health (Current State)

12. Recommendations