Pipeline · ComfyUI Prospect Research

End-to-end flow

stage 1source.run() → data/raw/{github,hn,reddit,code}.jsonl8761

stage 2filter.run() → data/filtered/candidates.jsonl (priority-sorted)8300

stage 3enrich.run() → data/enriched/{id}.json (per-candidate)2324

stage 4classify.run() → data/classified/{id}.json (Haiku batches of 20)2321

stage 5research.run() → research/{id}.md (Sonnet, 1 per candidate)185

stage 6shortlist.run() → outputs/all_ranked.csv + DM drafts185

stage one

Source

8,761

Pull candidates from any place ComfyUI users gather publicly. Each source emits a Candidate record with an id, source-tag, and raw payload.

github_engagement.py

Stargazers, forkers, issuers, commenters, PR authors across 6 ComfyUI orgs.

comfy-org/ComfyUIcomfyanonymous/ComfyUIreplicate/cog-comfyuibentoml/comfy-pack

hn_threads.py

Algolia search for ComfyUI-related HN threads, then traverse comment trees.

"comfyui""comfy ui production""comfy deploy"

reddit_subs.py

Posts + comments from 5 subreddits where deployment questions live.

r/comfyuir/StableDiffusionr/aivideo

github_code_search.py

GitHub code search for code patterns that signal production deployment intent.

"comfyui stripe""comfyui clerk""comfyui fastapi auth"

stage two

Filter

8,300

Drop bots/ghosts (dependabot, renovate, [bot] suffixes, "ghost"). Sort by source priority so --max picks highest-signal first.

Source priority weights

SOURCE_PRIORITY = {
    "github_issuer":           10,  # filed issues = real friction
    "github_code_search":       8,  # ships ComfyUI + auth/payments code
    "github_forker":            6,  # forked = building on top
    "hn_thread":                5,  # commented in production threads
    "reddit:":                  4,
    "github_pr_author":         3,
    "github_stargazer":         1,  # weakest signal
}

stage three

Enrich

2,324

For each github:* candidate, fetch profile + top 5 repos + READMEs + git tree (for Dockerfile detection) + commit email domains + org memberships + linked blog. Reddit/HN candidates pass through with forum activity.

~16 GitHub API calls per candidate

· 1 profile fetch
· 1 orgs fetch
· 1 repos list (shared)
· 5x README fetch
· 5x git tree (replaces 2 HEAD checks per repo)
· 3x commits sample (commit email domains)

Resilience features

· asyncio.Queue worker pool (5 concurrent)
· tmp+rename atomic writes
· Skip if file exists (resumable)
· Rate-limit handler sleeps until reset epoch
· Exception caught per-candidate, never aborts pool

PRECOMPUTED FLAGS

During enrichment we compute cheap regex/heuristic signals so the LLM ranks rather than reasons:

has_dockerfile has_payment_keyword has_auth_keyword has_server_framework has_modified_comfy_fork has_company_email has_known_org_membership forum_production_phrases

stage four · CLAUDE HAIKU

Classify

2,321

Spawn Claude Code CLI with Haiku 4.5 in batches of 20 candidates. Prompt over stdin (auto-pipe when >60KB to avoid Linux ARG_MAX). Returns JSON with builder_type, production_score (0-10), activity_score (0-10), final_score, evidence, recommended_action.

CLI invocation

claude --model claude-haiku-4-5 \
  --print \
  <<< "$PROMPT_WITH_BATCH"

Builder types

founder_builder Building a paid product
in_house_builder Engineer at a company shipping ComfyUI
commercial_freelancer Selling ComfyUI work
active_hobbyist Building, but for fun
shopper Browsing — no implementation evidence

SCORE CLAMP (anti-hobbyist-inflation)

if builder_type == "active_hobbyist": production_score = min(production_score, 4)
if builder_type == "shopper":         production_score = min(production_score, 5)
final_score = ((production + activity) / 2) * type_multiplier * confidence_multiplier
type_multiplier = {founder: 1.5, in_house: 2.0, freelancer: 1.5, hobbyist: 0.15, shopper: 0.4}

stage five · CLAUDE SONNET

Research

185

Spawn Claude Code CLI with Sonnet 4.6 + WebFetch tool, one process per candidate. Sonnet writes a per-candidate dossier: TL;DR, evidence, ComfyUI activity, production signals, best outreach angle. May DOWNGRADE or UPGRADE the Haiku classification based on triggers.

DOWNGRADE TRIGGERS

(a) Personal-use signals
(b) Non-AI Patreon / fork w/o modifying
(c) "for fun" / "learning" language
(d) Personal email only · no company
(e) Shopper signals · no implementation
(f) Confirms hobbyist on web

UPGRADE TRIGGERS

(g) Company website with paid product
(h) LinkedIn confirms employer
(i) Public funding / press / launch
(j) Known production org membership

Output frontmatter

---
candidate_id: github:ltdrdata
builder_type: in_house_builder
confidence: high
production_score: 10
activity_score: 10
final_score: 20.0
recommended_action: priority_outreach
contactable_via: [github]
last_updated: 2026-05-05
---

stage six

Shortlist

185

Aggregate frontmatter into a ranked CSV (all_ranked.csv), top-50 markdown table, and per-candidate DM drafts in outputs/outreach_drafts/.

DM template generation

Each DM uses a specific hook from the candidate's research:

Most-recent issue/PR title (live work they're doing)
Top repo name + tagline
Org membership context
RunFlow value prop matched to their pain (auth, billing, scaling, model loading)

Tech

Python 3.12 + asyncio

httpx for HTTP. tenacity for retries. asyncio.Queue worker pools so we never materialize 4000+ coroutines at once.

Pydantic v2

Strict models for every stage. YAML frontmatter parsed back into models for Stage 6 aggregation.

Claude CLI

subprocess.exec with stdin pipe. Haiku 4.5 for triage, Sonnet 4.6 for research. All under one Claude Max subscription.

GitHub REST API

Rate-limit-aware: detect remaining=0 + reset epoch, sleep until reset (cap 90 min), retry. Saved 685 candidates in our last run.

Atomic writes

tmp + rename for every JSON/markdown output. Pipeline can crash and resume without corruption.

Cost meter

Optional max_usd budget per stage with CostBudgetExceeded halt. Subscription covered actual spend so always 0.