Orbiter’s enrichment pipeline ingests person and company data from 5+ external sources (People Data Labs, LinkedIn/Enrich Layer, Y Combinator, Fundable, Crunchbase), normalizes it into 20+ relational tables in Xano, then projects it into a FalkorDB knowledge graph with nodes, edges, and vector embeddings.
We cloned all 9 core enrichment functions into an isolated qa/ namespace, systematically audited each one, and discovered 6 bugs in the production code — including 3 functions with globally-scoped database queries that corrupt data across people. Our QA clones fix all of these and add error isolation, structured responses, and an AI self-healing loop powered by Claude Opus 4.6.
The enrichment pipeline has 4 layers: Ingest (raw data from external APIs), Process (fan out into relational tables), Resolve (link records to companies), and Graph (project into FalkorDB).
The pipeline populates 20+ relational tables per person. Here are the core tables and what they store.
| Table | Records Per Person | Key Fields | Source |
|---|---|---|---|
| master_person | 1 | name, avatar, linkedin_url, current_title, bio, bio_500, sex, node_uuid, visibility | All sources merged |
| work_experience | 3-15 | title, company_name, company_domain, start/end dates, edge_uuid, master_company_id | PDL, Enrich Layer, Fundable |
| education_experience | 2-8 | school_name, field_of_study, degree_name, activities, start/end year, edge_uuid | PDL, Enrich Layer |
| skills_join | 20-70 | skill_id → skills.skill (text) | PDL, Enrich Layer |
| certification | 0-10 | name, org_name, issue_date, edge_id, master_company_id | Enrich Layer |
| honor | 0-5 | title, issuer_name, issued_on_year, node_uuid, edge_id | Enrich Layer |
| volunteering | 0-5 | title, company_name, start/end year, master_company_id | Enrich Layer |
| master_link | 5-12 | service (linkedin, twitter, github...), link_url, profile (bool) | YC, PDL, Enrich Layer |
| master_email | 1-6 | email_address, email_type (work/personal), active_status | LinkedIn, PDL |
| master_avatar | 1-5 | url, main (bool), is_placeholder (bool) | Twitter, LinkedIn, Fundable |
| about_person | 2-4 | biography (text), data_source_id | PDL headline, Enrich Layer, LLM |
| interest_join | 5-15 | interested_in_id → interest.interest | Enrich Layer |
| language_join | 1-5 | languages_id → languages.language | Enrich Layer |
| ID | Source | Type | What It Provides |
|---|---|---|---|
| 91 | People Data Labs (PDL) | API | Emails, phones, education arrays, work arrays, skills, gender |
| 94 | Enrich Layer | Aggregated | LinkedIn profile, skills, experience, education, certifications, volunteering, honors, interests, languages |
| 89 | Fundable | API | Investor profiles, funding deals, organizations, total raised |
| 87 | Y Combinator | Database | Batch info, founders, company one-liners, social links |
| 8 | Crunchbase | API | Business profiles, funding rounds |
FalkorDB (formerly RedisGraph) is a property graph database that uses the Cypher query language. Every enriched person and company becomes a node with vector embeddings, connected by typed, weighted relationships.
| Label(s) | Created By | Key Properties |
|---|---|---|
| :Entity:Person | add-person-node | uuid, name, name_embedding (vecf32), visibility, avatar, bio, roles, skills, interests |
| :Entity:Person:Angel | update-person-node (if is_angel) | + investor_type, exits, investments, board_advisory_experience |
| :Entity:Company | update-company-node | uuid, name, domain, name_embedding, industries, specialties, employee_range, founded |
| :Entity:Company:School | update-company-node (.edu domain) | Identified by .edu domain or is_school=true flag |
| :Entity:Company:VC_Firm | update-company-node (is_vc) | + investment_range, stages, sweet_spot |
| :Honor | resolve-edges-honor | uuid, title, description, issued_on_year, issuer_name |
| Relationship | Direction | Weight | How It’s Created |
|---|---|---|---|
| ATTENDED | Person → School | 40 | create-education-edges: groups education records by school, collapses into one edge, generates LLM description via Gemini 2.5 Flash |
| ATTENDED_TOGETHER | Person ↔ Person | 35-70 | Derived: Automatically created when two people have overlapping attendance dates at the same school. Weight varies: 35 (same major), 50 (same field), 70 (just temporal overlap) |
| STUDIED_UNDER | Person → Instructor | 10 | Derived: Created when a student’s attendance overlaps with an instructor’s TAUGHT_AT at the same school |
| WORKED_AT / WORKS_AT | Person → Company | varies | create-work-edges: links work_experience to companies, sets current_title on master_person |
| RECEIVED_HONOR | Person → Honor | 20 | resolve-edges-honor: creates Honor node + edge in single Cypher transaction |
| ISSUED_BY | Honor → Organization | 25 | Same transaction as RECEIVED_HONOR, conditional on issuer existing in graph |
randomUUID() for nodes. This UUID is stored back on the Xano record (node_uuid on master_person, edge_uuid on join tables). Both systems can find each other.name_embedding from bio_500 via OpenAI embeddings. Company nodes from about_500. Enables semantic similarity search in the graph.weight property. Lower = stronger. STUDIED_UNDER (10) ranks higher than ATTENDED (40). Graph traversal prioritizes meaningful connections.visibility: false. Only become visible after bio_500 exists. Prevents incomplete data in user-facing queries.“John Traver is Co-founder and Creative Technologist at Frame.io, blending cinematic workflow with cutting-edge code. From scripting MEL and AE at RIT to shipping the 5-star KataData app in Objective-C, he has mastered Python, Ruby, JS, and now Haskell to scale video collaboration for Oscar-winning teams.”
| Title | Company | Period | Source |
|---|---|---|---|
| Co-Founder & Creative Technologist | Frame.io | 2014 – 2025 | Fundable |
| Chief Scientist | Katabatic Digital | 2012 – 2014 | Enrich Layer |
| K/Lab Engineering | Katabatic Digital | 2010 – 2012 | Enrich Layer |
| School | Degree | End | Source |
|---|---|---|---|
| Rochester Institute of Technology | BS | 2014 | PDL |
| Rochester Institute of Technology | BS | 2010 | PDL |
| Rochester Institute of Technology | — | 2010 | PDL |
| Shenendehowa High School | — | 2006 | PDL |
| Rochester Institute of Technology | — | — | PDL |
Person node 7bd19581... → ATTENDED → Rochester Institute of Technology (School node) • ATTENDED → Shenendehowa HS • WORKED_AT → Frame.io • WORKED_AT → Katabatic Digital
| Title | Company | Period |
|---|---|---|
| Owner | Fandom | Current |
| Board Member | Wikimedia Foundation | 2003 – present |
| Executive Chairman | The People’s Operator | 2014 – 2015 |
| Founder | Mighty Capital | 2001 – 2007 |
| Founder & CEO | Wikitribune | 2017 – present |
| School | Degree | End |
|---|---|---|
| University of Alabama | M.S. Finance | 1991 |
| Auburn University | B.S. Finance | 1989 |
| Randolph High School | — | 1983 |
PDL returned not_found for Jimmy Wales — his profile was too high-profile/protected. All data came from Enrich Layer (LinkedIn scrape) instead. The pipeline handled this gracefully.
| Title | Company | Period |
|---|---|---|
| Co-Founder, CEO | Mux | Current |
| VP Engineering | Brightcove | 2012 – 2015 |
| Co-Founder, CEO | Zencoder | 2010 – 2012 |
| School | Degree |
|---|---|
| Trinity International University | BA |
| Wheaton College | — |
Jon Dahl was one of two people who failed in the original production batch test (306s timeout on run-base-company-process). After QA fixes — specifically the direct function.run bypass that eliminates the HTTP hop — he now passes 9/9 in 45s.
| Certification | Issuer |
|---|---|
| Advanced React and Redux | Udemy |
| Modern JavaScript | Udemy |
| Complete React Developer | Udemy |
| CSS - The Complete Guide | Udemy |
| Google Africa Developer Scholarship | Google / Andela |
| JavaScript Algorithms & Data Structures | Udemy |
| Modern React with Redux | Udemy |
Each certification goes through resolve-edges-certifications which matches the issuer (Udemy, Google) to a master_company record. In production, this function had no input parameter and queried ALL certifications globally — our QA fix scopes it to this person only.
Deep code review uncovered 6 bugs. Three are data-corruption-level severity (globally-scoped queries that modify records belonging to other people).
Function resolve-edges-certifications (12719) had NO input parameter at all. All 3 db.query certification calls queried the entire table. When enriching Charles Njenga (ID 20), it would also modify Mario Haarmann’s (ID 18) 3 certifications.
Fix: Added master_person_id input. Added WHERE clause to all 3 queries.
Function resolve-edges-honor (12715): domain resolution and LinkedIn resolution sections queried ALL 71 honor records globally instead of just the current person’s.
Fix: Added master_person_id WHERE clause to sections 2 and 3.
Function resolve-edges-volunteering (12716): LinkedIn resolution and Cypher creation sections queried ALL 105 volunteering records globally.
Fix: Added master_person_id WHERE clause to sections 2 and 3.
Function run-base-company-process (12720) had 3 database calls with "" as the table name in the staff_count section. Deploys fine but silently fails at runtime.
Fix: Identified correct table name. Replaced all 3 empty strings.
process-enrich-layer (12712) has 16 independent sections. A crash in Section 3 (avatar) kills Sections 4-16 (skills, education, work, certs, etc.). Production has 764 enrich_history records stuck with processing=true.
Fix: Each of 16 sections wrapped in individual try_catch. Crashes logged to crash_log, continue to next section.
complete-person-enrich (12713): No try_catch. Crash leaves processing=true forever and queue entry never cleaned up.
Fix: Wrapped in try_catch. Queue cleanup moved outside try block (always executes).
| Metric | Production (Before) | QA Pipeline (After) |
|---|---|---|
| Pass Rate | 99.2% (357/360) | 100% (180/180) |
| Failures | 3 (timeout on company process) | 0 |
| Stuck Processing Records | 778 in production | 0 (guaranteed cleanup) |
| Global Query Bugs | 3 functions unscoped | All queries scoped to person |
| Error Isolation | None (1 crash kills all 16 sections) | Per-section try_catch |
| Response Format | String: "success" | Structured JSON: {processed, resolved, errors, skipped} |
| Duplicate History | Creates new record every run | Check-and-reuse existing records |
| AI Self-Fix | None | Claude Opus 4.6 via OpenRouter with 13 XanoScript rules |
| Company Process Timeout | HTTP hop timeout at 300s (Xano nginx) | Direct function.run (no HTTP hop) |
All 9 enrichment functions cloned into qa/ namespace with fixes applied.
Final verification: 20 people, 9 stages each, 180 total stages. All passed with 0 failures.
| # | Name | Company | Passed | Failed | Time | Status |
|---|---|---|---|---|---|---|
| 1 | Josh Diamond | Frame.io | 9 | 0 | 22s | PASS |
| 2 | John Traver | Frame.io | 9 | 0 | 22s | PASS |
| 3 | Emery Wells | Frame.io | 9 | 0 | 3s | PASS |
| 4 | Molly Alter | — | 9 | 0 | 18s | PASS |
| 5 | Jason Diamond | The Diamond Bros. | 9 | 0 | 17s | PASS |
| 6 | Itai Tsiddon | — | 9 | 0 | 33s | PASS |
| 7 | Amish Jani | FirstMark Capital | 9 | 0 | 42s | PASS |
| 8 | Jared Leto | — | 9 | 0 | 2s | PASS |
| 9 | Mark L. Pederson | Orbiter | 9 | 0 | 26s | PASS |
| 10 | Kevin Spacey | — | 9 | 0 | 5s | PASS |
| 11 | Thomas Hesse | — | 9 | 0 | 13s | PASS |
| 12 | Walter Kortschak | — | 9 | 0 | 8s | PASS |
| 13 | Jimmy Wales | Wikitribune | 9 | 0 | 8s | PASS |
| 14 | Larry Sanger | — | 9 | 0 | 9s | PASS |
| 15 | Clark Valberg | — | 9 | 0 | 9s | PASS |
| 16 | Jon Dahl | Mux | 9 | 0 | 45s | PASS |
| 17 | Mario Haarmann | — | 9 | 0 | 6s | PASS |
| 18 | Vijay Nagappan | — | 9 | 0 | 6s | PASS |
| 19 | Charles Njenga | — | 9 | 0 | 8s | PASS |
| 20 | Dynamo Mbugua | — | 9 | 0 | 21s | PASS |
Average: 16.4s/person. Fastest: Jared Leto (2s). Slowest: Jon Dahl (45s).
Previously failing: Larry Sanger (was 438s timeout → now 9s) and Jon Dahl (was 306s timeout → now 45s).
qa/validate-enrich-data): checks data shapes before processingCurrent state of the production enrichment system, measured via the enrichment diagnostics MCP.
| Issue | Count | Severity | Root Cause |
|---|---|---|---|
| Stuck enrich_history records (processing=true) | 778 | HIGH | Bug #5 & #6: no error isolation, no zombie cleanup |
| People with visibility=false | 218 (10.2%) | HIGH | Stuck processing prevents complete-person-enrich from running |
| Placeholder avatars marked as main | 1,227 | MEDIUM | Systemic bug in replace-avatar logic (separate from enrichment) |
| Company queue backlog | 10,159 | MEDIUM | New companies queued by edge resolvers faster than processed |
| People missing enrich_data record | 5 | LOW | Created before person_enrich_data table existed |
| Crash log entries | 29 | INFO | Various runtime errors captured by crash_log table |
The QA pipeline’s fixes directly address the top two issues: per-section try_catch eliminates stuck records, and guaranteed cleanup in complete-person-enrich prevents zombies.
processing=false on all stuck enrich_history_person records, allowing re-enrichment.