End-to-end flow
Source
Pull candidates from any place ComfyUI users gather publicly. Each source emits a Candidate record with an id, source-tag, and raw payload.
Stargazers, forkers, issuers, commenters, PR authors across 6 ComfyUI orgs.
Algolia search for ComfyUI-related HN threads, then traverse comment trees.
Posts + comments from 5 subreddits where deployment questions live.
GitHub code search for code patterns that signal production deployment intent.
Filter
Drop bots/ghosts (dependabot, renovate, [bot] suffixes, "ghost"). Sort by source priority so --max picks highest-signal first.
SOURCE_PRIORITY = {
"github_issuer": 10, # filed issues = real friction
"github_code_search": 8, # ships ComfyUI + auth/payments code
"github_forker": 6, # forked = building on top
"hn_thread": 5, # commented in production threads
"reddit:": 4,
"github_pr_author": 3,
"github_stargazer": 1, # weakest signal
}
Enrich
For each github:* candidate, fetch profile + top 5 repos + READMEs + git tree (for Dockerfile detection) + commit email domains + org memberships + linked blog. Reddit/HN candidates pass through with forum activity.
- · 1 profile fetch
- · 1 orgs fetch
- · 1 repos list (shared)
- · 5x README fetch
- · 5x git tree (replaces 2 HEAD checks per repo)
- · 3x commits sample (commit email domains)
- · asyncio.Queue worker pool (5 concurrent)
- · tmp+rename atomic writes
- · Skip if file exists (resumable)
- · Rate-limit handler sleeps until reset epoch
- · Exception caught per-candidate, never aborts pool
During enrichment we compute cheap regex/heuristic signals so the LLM ranks rather than reasons:
Classify
Spawn Claude Code CLI with Haiku 4.5 in batches of 20 candidates. Prompt over stdin (auto-pipe when >60KB to avoid Linux ARG_MAX). Returns JSON with builder_type, production_score (0-10), activity_score (0-10), final_score, evidence, recommended_action.
claude --model claude-haiku-4-5 \ --print \ <<< "$PROMPT_WITH_BATCH"
- founder_builder Building a paid product
- in_house_builder Engineer at a company shipping ComfyUI
- commercial_freelancer Selling ComfyUI work
- active_hobbyist Building, but for fun
- shopper Browsing — no implementation evidence
if builder_type == "active_hobbyist": production_score = min(production_score, 4)
if builder_type == "shopper": production_score = min(production_score, 5)
final_score = ((production + activity) / 2) * type_multiplier * confidence_multiplier
type_multiplier = {founder: 1.5, in_house: 2.0, freelancer: 1.5, hobbyist: 0.15, shopper: 0.4}
Research
Spawn Claude Code CLI with Sonnet 4.6 + WebFetch tool, one process per candidate. Sonnet writes a per-candidate dossier: TL;DR, evidence, ComfyUI activity, production signals, best outreach angle. May DOWNGRADE or UPGRADE the Haiku classification based on triggers.
- (a) Personal-use signals
- (b) Non-AI Patreon / fork w/o modifying
- (c) "for fun" / "learning" language
- (d) Personal email only · no company
- (e) Shopper signals · no implementation
- (f) Confirms hobbyist on web
- (g) Company website with paid product
- (h) LinkedIn confirms employer
- (i) Public funding / press / launch
- (j) Known production org membership
--- candidate_id: github:ltdrdata builder_type: in_house_builder confidence: high production_score: 10 activity_score: 10 final_score: 20.0 recommended_action: priority_outreach contactable_via: [github] last_updated: 2026-05-05 ---
Shortlist
Aggregate frontmatter into a ranked CSV (all_ranked.csv), top-50 markdown table, and per-candidate DM drafts in outputs/outreach_drafts/.
Each DM uses a specific hook from the candidate's research:
- Most-recent issue/PR title (live work they're doing)
- Top repo name + tagline
- Org membership context
- RunFlow value prop matched to their pain (auth, billing, scaling, model loading)
Tech
httpx for HTTP. tenacity for retries. asyncio.Queue worker pools so we never materialize 4000+ coroutines at once.
Strict models for every stage. YAML frontmatter parsed back into models for Stage 6 aggregation.
subprocess.exec with stdin pipe. Haiku 4.5 for triage, Sonnet 4.6 for research. All under one Claude Max subscription.
Rate-limit-aware: detect remaining=0 + reset epoch, sleep until reset (cap 90 min), retry. Saved 685 candidates in our last run.
tmp + rename for every JSON/markdown output. Pipeline can crash and resume without corruption.
Optional max_usd budget per stage with CostBudgetExceeded halt. Subscription covered actual spend so always 0.