Why Contact Type and Deal Type Must Come First
The core insight: Third-party data doesn’t solve bad filtering architecture—it multiplies the problem.
This is part of the Capital Advisory Series. I’m focusing on filtering hierarchy because it’s where ambitious AI implementations fail against basic data infrastructure.
The Forgotten Investor Problem
Investment professionals face a specific retrieval challenge: finding investors they’ve spoken to once, four months ago, whose name they don’t remember, but whose conversation notes contain deal-specific exclusions.
The current approach sorts by interaction count. High-frequency contacts surface first. The hard part starts at page two—investors 101 through 200—where memory fades but notes remain.
These notes contain critical intelligence:
- “Not interested in baseball, focused on NBA”
- “Pass on early-stage, only Series C+”
- “Sports deals only if Formula One or Sail GP”
This intelligence lives in loss reasons, comment fields, and scattered interaction logs. Without proper filtering, it becomes noise.
The Filtering Hierarchy
Investment matching requires layered filtering, executed in sequence:
Layer 1: Contact Type and Deal Type
- Contact type: investor, family office, credit fund, client
- Deal type: venture early-stage (seed through Series B), venture late-stage (Series C+), opportunistic
- Investment stage matching
Layer 2: Specific Exclusions
- Sector preferences within broader categories
- Stage boundaries within catchall terms
- Geographic or regulatory constraints
Layer 1 prevents random results. Layer 2 prevents relevant-but-wrong results.
The tension emerges when teams want to add third-party data (Crunchbase, PitchBook, Harmonic) before Layer 1 functions reliably. The impulse makes sense—external data promises completeness. But incomplete proprietary filtering means you can’t validate whether external data improves match quality or introduces garbage.
Why This Order Matters
Bad context is cheap but toxic. API calls to third-party vendors cost pennies. But if your system can’t reliably filter “investors interested in sports” from “investors interested in NBA teams specifically,” adding 10,000 external investor profiles just creates 10,000 new ways to waste time.
Working across capital advisory implementations, I’ve learned the most sophisticated matching algorithms fail when basic taxonomies aren’t enforced at the pipeline entry point.
Consider the sports deal scenario:
- Query: “Find investors for NBA team acquisition”
- Without Layer 1: Returns clients, credit funds, private equity investors, family offices
- With Layer 1, without Layer 2: Returns sports investors including those who explicitly passed on basketball
- With both layers: Returns NBA-focused investors with clean interaction history
Each layer removes an order of magnitude of noise. Layer 1 might cut 10,000 contacts to 1,000. Layer 2 cuts 1,000 to 100 qualified matches.
Implementation Strategy
The correct sequence for investor matching systems:
-
Define contact type taxonomy
- Map existing records to standardized types
- Create validation rules for new entries
- Flag records with missing or conflicting types
-
Standardize deal classification
- Establish catchall terms (opportunistic, venture early/late-stage)
- Map deal stages to categories (seed/A/B = early, C+ = late)
- Add missing stages (pre-seed often absent from legacy systems)
-
Enforce Layer 1 filtering
- Block queries that don’t specify contact type and deal type
- Make these filters required, not optional
- Test with known-good matches before expanding
-
Build Layer 2 exclusions
- Parse loss reasons for sector-specific passes
- Extract preference signals from notes
- Create negative filters (“interested in sports BUT NOT baseball”)
-
Only then consider third-party data
- Use free trials to test incremental value
- Compare match quality against proprietary-only baseline
- Calculate cost per qualified introduction
Most teams want to skip to step 5. The economics don’t work. A two-day free trial of external data doesn’t prove value if your internal filtering generates false positives.
What Changes After
Once Layer 1 filtering works reliably, the system behavior shifts:
- Searches return 100 results instead of 10,000
- Each result has verified contact type and deal alignment
- Loss reasons become machine-readable exclusions
- Third-party data can be evaluated against clean baseline
The investor you spoke to four months ago surfaces because the system knows:
- They’re tagged as “family office” not “client”
- Their deal interest includes “venture late-stage”
- Their notes don’t contain exclusionary loss reasons
- They haven’t been contacted in 120+ days
This is operational intelligence. It only works when basic filters eliminate noise before sophisticated matching begins.
Next Steps
If you’re building investor matching systems:
- Audit contact type coverage: What percentage of records have validated contact types?
- Map deal taxonomy: Can your system translate “venture early-stage” to “seed, Series A, Series B”?
- Test Layer 1: Run a query requiring contact type and deal type. Do results make sense before reading notes?
- Defer third-party data: Don’t evaluate external vendors until proprietary filtering works
The economics of agent-assisted investment processes depend on this foundation. Bad context multiplies through every API call, every prompt, every generated recommendation.
Get contact type and deal type right first. Everything else follows.
If you’re building investor matching systems and wrestling with filtering architecture, I’d be interested in discussing your specific challenges.