Data Waterfall Architecture: How to Build Multi-Provider Enrichment Pipelines
A data waterfall (also called an enrichment waterfall or data cascade) is an architecture pattern where multiple data providers are queried in sequence - each one filling in gaps left by the previous provider - to maximize data coverage and accuracy. Instead of relying on a single enrichment source, the waterfall routes each data request through a prioritized chain of providers, stopping as soon as a valid result is found.
This pattern exists because no single B2B data provider covers the entire market. ZoomInfo might find 60% of emails in your list, Apollo might find a different 55%, and a third provider finds another 50%. But the overlap isn't 100% - each provider has pockets of data the others miss. A waterfall architecture combines their strengths and achieves 75-90% total coverage where any single provider tops out at 40-65%.
Why Single-Provider Enrichment Fails
Here's the reality of B2B data coverage that most vendors won't tell you:
The Coverage Gap
We ran an analysis across 50,000 B2B contacts targeted at mid-market SaaS companies (50-500 employees) in North America. Here's what each provider returned when used alone:
Provider: Provider A (Premium) | Email Found Rate: 62% | Email Verified Rate: 54% | Phone Found Rate: 38% | Title Accuracy: 89%
Provider: Provider B (Mid-tier) | Email Found Rate: 51% | Email Verified Rate: 44% | Phone Found Rate: 29% | Title Accuracy: 82%
Provider: Provider C (Budget) | Email Found Rate: 43% | Email Verified Rate: 35% | Phone Found Rate: 21% | Title Accuracy: 76%
Provider: 3-Provider Waterfall | Email Found Rate: 83% | Email Verified Rate: 74% | Phone Found Rate: 52% | Title Accuracy: 91%
The waterfall didn't just add the numbers together - there's overlap between providers. But the unique contribution of each additional provider was significant:
- Provider A alone: 62% email coverage
- Adding Provider B: +14% incremental (providers B found 14% of emails that A missed)
- Adding Provider C: +7% incremental (C found 7% that both A and B missed)
- Total: 83% coverage
That incremental 21% represents thousands of prospects you'd never reach with a single provider.
Why Providers Miss Different Data
Each data provider builds their database differently:
- Web scraping providers excel at public data (job titles from LinkedIn, company info from websites) but miss private contact details
- Intent-based providers are strong on active buyers but weak on dormant segments
- Community-driven providers (like Apollo's contributor network) have better data on startups and tech companies but gaps in traditional industries
- Premium providers (like ZoomInfo) have strong direct dial and organizational data for enterprise companies but thinner coverage for SMBs
- Email verification providers don't find new emails but dramatically improve accuracy of emails found by other providers
Waterfall Architecture Patterns
Pattern 1: Sequential Waterfall (Most Common)
The simplest and most cost-effective pattern. Query providers one at a time, in priority order, stopping when you get a valid result.
`` Input: {name, company, domain} | v [Provider A] --> Found valid email? --> YES --> Stop, return result | NO | v [Provider B] --> Found valid email? --> YES --> Stop, return result | NO | v [Provider C] --> Found valid email? --> YES --> Stop, return result | NO | v Return: {enrichment_failed: true} ``
Pros: Lowest cost (you only pay for credits until a match is found). Simple to build and debug.
Cons: Slower (sequential API calls). Provider A's result is accepted without cross-checking.
Best for: High-volume enrichment where cost matters more than perfect accuracy.
Pattern 2: Parallel Waterfall with Consensus
Query all providers simultaneously and use a consensus mechanism to pick the best result.
`` Input: {name, company, domain} | v [Provider A] --| [Provider B] --|--> Consensus Engine --> Best result [Provider C] --| ``
The consensus engine applies rules like:
- If 2+ providers return the same email, confidence = high
- If only 1 provider returns an email, run it through verification
- Use the provider with highest historical accuracy for title/seniority
- Merge results (email from A, phone from B, title from C)
Pros: Highest accuracy. Multiple data points for cross-validation. Fastest (parallel calls).
Cons: Most expensive (every provider is called for every record). Higher complexity.
Best for: High-value ABM campaigns where data quality is paramount and volume is lower (under 5,000 contacts).
Pattern 3: Tiered Waterfall
Group providers by cost/quality tier. Start with the cheapest tier and escalate to more expensive providers only when cheap options fail.
`` Input: {name, company, domain} | v --- Tier 1: Free/cheap sources --- [Your database cache] [Free API endpoints] | Not found? | v --- Tier 2: Mid-cost providers --- [Apollo API] [Hunter.io] | Not found? | v --- Tier 3: Premium providers --- [ZoomInfo] [Cognism] | v [Email Verification] --> Final result ``
Pros: Optimizes cost by exhausting cheap sources first. Premium credits are only used when necessary.
Cons: Slower for records that require premium providers. Requires more complex orchestration.
Best for: Teams with budget constraints enriching large lists (10,000+ contacts per month).
Building a Waterfall in Clay
Clay is purpose-built for waterfall enrichment. Here's how to implement a production-grade pipeline.
Step 1: Set Up Your Input Table
Start with a Clay table containing your raw lead data. Minimum required fields:
- First Name
- Last Name
- Company Name
- Company Domain (if available)
- LinkedIn URL (if available)
The more seed data you provide, the higher your match rates across all providers.
Step 2: Configure the Enrichment Waterfall
In Clay, you build waterfalls by adding enrichment columns in sequence. Each column can be configured to only run if the previous one didn't return a result.
Column 1: Find Work Email (Primary Provider)
- Use Clay's built-in "Find Work Email" enrichment
- This queries Clay's aggregated database (75+ providers) using a waterfall internally
- Expected hit rate: 60-70%
Column 2: Find Work Email (Secondary - Apollo)
- Add an Apollo People Enrichment column
- Set the "Only run if" condition to: Column 1 is empty
- This catches leads that Clay's primary waterfall missed
- Expected incremental hit rate: 10-15%
Column 3: Find Work Email (Tertiary - Hunter.io)
- Add a Hunter.io Email Finder column
- Set the "Only run if" condition to: Column 1 AND Column 2 are empty
- Pattern-based email finding as a last resort
- Expected incremental hit rate: 5-8%
Column 4: Email Verification
- Add a verification column (MillionVerifier, ZeroBounce, or NeverBounce)
- Run this on ALL emails found, regardless of which provider returned them
- Mark emails as: Valid / Risky / Invalid
- Only pass "Valid" emails to your outbound tools
Column 5: Consolidated Email
- Create a formula column that picks the final email:
- If Column 1 exists and Column 4 = Valid, use Column 1 - Else if Column 2 exists and Column 4 = Valid, use Column 2 - Else if Column 3 exists and Column 4 = Valid, use Column 3 - Else: mark as "No Valid Email Found"
Step 3: Add Firmographic Enrichment
Beyond email, enrich with company and contact data:
- Company data: Employee count, revenue, funding, industry, tech stack
- Contact data: Job title, seniority, department, LinkedIn URL
- Intent signals: Recent hiring (from job postings), technology changes, funding rounds
Step 4: Score and Filter
Add scoring columns that evaluate enriched data against your ICP:
- ICP Fit Score (based on company size, industry, funding, tech stack)
- Contact Relevance Score (based on title, seniority, department)
- Combined Score = weighted average
Filter out records below your threshold before pushing to outbound.
Step 5: Push to Destinations
Configure Clay's integrations to push enriched, scored, filtered leads to:
- HubSpot - Create/update contacts with all enriched properties
- Instantly/Smartlead - Add to outbound campaigns
- Slack - Notify reps about high-score leads
Cost Optimization Strategies
Enrichment costs can spiral quickly. Here's how to keep them manageable.
1. Cache Everything
Build a local cache (or use Clay's deduplication) so you never pay to enrich the same person twice. At scale, 15-25% of your enrichment requests will be duplicates of previous lookups.
Savings: 15-25% reduction in enrichment costs.
2. Pre-Filter Before Enrichment
Don't enrich everyone. Apply cheap filters first:
- Verify the company exists and matches your ICP (using free company data)
- Check if the domain is a personal email domain (gmail.com, yahoo.com) - skip these
- Deduplicate against your existing CRM database
- Remove contacts already in active sequences
Savings: 20-40% reduction by avoiding enrichment on non-ICP records.
3. Tiered Provider Strategy
Not every lead deserves premium enrichment:
- Tier 1 accounts (ABM targets): Use the full waterfall with premium providers. Spend $2-5 per contact for maximum coverage.
- Tier 2 accounts (good fit): Use standard waterfall (2 providers + verification). Spend $0.50-1.50 per contact.
- Tier 3 accounts (exploratory): Use single provider + verification. Spend $0.10-0.30 per contact.
Savings: 30-50% reduction vs. running premium enrichment on everything.
4. Negotiate Provider Contracts
Most enrichment providers offer volume discounts:
- Apollo: Annual plans save 20-30% vs. monthly
- ZoomInfo: Multi-year contracts can reduce per-credit cost by 40%
- Clay: Enterprise plans include better credit rates
- Bulk verification services (MillionVerifier): $0.0005-0.001 per email at high volume
5. Monitor Hit Rates by Provider
Track which providers are actually contributing unique results. If Provider C is only adding 2% incremental coverage, the cost might not justify keeping it in the waterfall.
Build a simple dashboard tracking:
- Hit rate per provider (% of queries that return a result)
- Incremental hit rate (% of results that no previous provider found)
- Cost per successful enrichment per provider
- Email verification pass rate per provider (data quality signal)
Real-World Waterfall Performance Data
Here are actual results from three Clay waterfall implementations we built for clients.
Client A: Series B SaaS (Targeting Mid-Market)
- List size: 12,000 contacts/month
- Target: VP+ at companies with 100-1,000 employees
- Waterfall: Clay Built-in > Apollo > Hunter.io > MillionVerifier
Metric: Email found rate | Before (Apollo only): 52% | After (Waterfall): 81%
Metric: Email verified rate | Before (Apollo only): 44% | After (Waterfall): 72%
Metric: Valid emails per month | Before (Apollo only): 5,280 | After (Waterfall): 8,640
Metric: Cost per valid email | Before (Apollo only): $0.42 | After (Waterfall): $0.38
Metric: Meetings booked/month | Before (Apollo only): 18 | After (Waterfall): 34
The waterfall increased valid email output by 64% while actually reducing cost per valid email by 10% (because the incremental providers were cheaper than the primary).
Client B: Growth-Stage Agency (Targeting SMB)
- List size: 25,000 contacts/month
- Target: Founders and marketing leads at companies with 10-100 employees
- Waterfall: Clay Built-in > Apollo > Dropcontact > NeverBounce
Metric: Email found rate | Before (Single provider): 47% | After (Waterfall): 78%
Metric: Email verified rate | Before (Single provider): 39% | After (Waterfall): 69%
Metric: Bounce rate on sends | Before (Single provider): 8.2% | After (Waterfall): 1.4%
Metric: Cost per valid email | Before (Single provider): $0.31 | After (Waterfall): $0.29
Client C: Enterprise Sales Team (Targeting F500)
- List size: 3,000 contacts/month
- Target: C-suite and VP at Fortune 500 companies
- Waterfall: ZoomInfo > Clay Built-in > Cognism > ZeroBounce
Metric: Email found rate | Before (ZoomInfo only): 68% | After (Waterfall): 87%
Metric: Direct dial found rate | Before (ZoomInfo only): 41% | After (Waterfall): 58%
Metric: Email verified rate | Before (ZoomInfo only): 61% | After (Waterfall): 79%
Metric: Cost per valid email | Before (ZoomInfo only): $1.85 | After (Waterfall): $2.10
For enterprise targets, the waterfall cost slightly more per valid email but the incremental coverage justified it - those 19% additional contacts represented high-value enterprise prospects worth $50K+ in potential ACV.
Advanced Waterfall Techniques
Conditional Provider Selection
Instead of always running the same waterfall sequence, use conditional logic to pick providers based on the target:
- Tech companies (SaaS, software): Apollo as primary (strongest tech company coverage)
- Traditional industries (manufacturing, finance): ZoomInfo as primary (strongest enterprise coverage)
- European contacts: Cognism as primary (GDPR-compliant, strong EU data)
- Startups (under 50 employees): Clay + Apollo (best startup coverage)
Re-Enrichment Cycles
B2B data decays at 30-40% per year (people change jobs, companies pivot, phone numbers change). Build a re-enrichment cycle:
- Monthly: Re-enrich contacts that bounced in outbound campaigns
- Quarterly: Re-enrich your active pipeline and top accounts
- Bi-annually: Full database re-enrichment
Waterfall for Phone Numbers
Email waterfalls get the most attention, but phone number coverage is even worse across providers. A phone-specific waterfall is critical for teams that rely on cold calling:
- ZoomInfo (best direct dial coverage for enterprise)
- Apollo (good coverage for tech/SaaS)
- Cognism (strong for mobile numbers, especially EU)
- Lusha (crowdsourced - good for verifying numbers found elsewhere)
Expected results: 25-35% direct dial coverage with a single provider, 45-60% with a 3-provider waterfall.
Building vs. Buying Waterfall Infrastructure
Build It Yourself (Custom Code)
When it makes sense: You have engineering resources, need complete customization, or process 100K+ contacts per month where Clay credits become expensive.
Tech stack: Python/Node.js script that calls provider APIs in sequence, stores results in a database, and handles rate limiting, retries, and error handling.
Time to build: 2-4 weeks for a basic waterfall. 6-8 weeks for a production-grade system with caching, monitoring, and a UI.
Ongoing maintenance: 5-10 hours/month for API changes, provider updates, and bug fixes.
Use Clay (No-Code/Low-Code)
When it makes sense: Your volume is under 50K contacts/month, you don't have dedicated engineering, or you need to iterate quickly on waterfall logic.
Time to build: 2-4 hours for a basic waterfall. 1-2 days for a production-grade pipeline with scoring and routing.
Ongoing maintenance: 1-2 hours/month.
Cost: $349-1,000+/month depending on volume and plan.
Comparison
Factor: Setup time | Custom Build: 2-8 weeks | Clay: 2-8 hours
Factor: Monthly maintenance | Custom Build: 5-10 hours | Clay: 1-2 hours
Factor: Cost at 10K contacts/mo | Custom Build: $200-500 (API costs) | Clay: $349-500
Factor: Cost at 50K contacts/mo | Custom Build: $800-2,000 (API costs) | Clay: $1,000-2,500
Factor: Cost at 200K contacts/mo | Custom Build: $2,000-5,000 (API costs) | Clay: $3,000-8,000+
Factor: Flexibility | Custom Build: Unlimited | Clay: High (within Clay's framework)
Factor: Engineering required | Custom Build: Yes | Clay: No
Factor: Time to iterate | Custom Build: Hours-days | Clay: Minutes
For most B2B teams, Clay is the right choice until you're processing 100K+ contacts per month or need customization that Clay can't handle.
FAQ
What is a data waterfall in B2B enrichment?
A data waterfall is an architecture where multiple data providers are queried in sequence to maximize coverage. Instead of relying on one provider that might only find 50-60% of the data you need, the waterfall tries Provider A first, then Provider B for any gaps, then Provider C, and so on. This typically improves total data coverage from 40-65% (single provider) to 75-90% (multi-provider waterfall).
How many providers should be in a waterfall?
Three to four providers is the sweet spot for most B2B teams. The first provider typically covers 50-65% of records, the second adds 10-15% incremental, and the third adds 5-8% more. A fourth provider rarely adds more than 3-5% incremental coverage, and the complexity and cost usually aren't worth it beyond that. Always include a verification provider as the final step regardless of how many finders you use.
Does a waterfall cost more than using a single provider?
Not necessarily. In a sequential waterfall, you only pay for additional providers when the first one fails. If your primary provider has a 60% hit rate, you're only paying for the second provider on 40% of lookups, and the third on perhaps 25%. Many teams find that the cost per valid email actually decreases with a waterfall because the incremental providers are cheaper per lookup. The total budget increases because you're finding more valid contacts, but the unit economics improve.
How do I choose which provider to query first in the waterfall?
Prioritize by: (1) data quality - the provider with the highest verified accuracy rate should go first, (2) coverage for your specific ICP - if you target tech companies, a provider strong in tech data goes first, (3) cost - among providers with similar quality, put the cheaper one first. Run a benchmark test with 1,000 contacts from your target market across all providers to establish baseline hit rates before finalizing your waterfall order.
Can I build a waterfall without Clay?
Yes. You can build waterfalls using Make or n8n (connecting provider APIs in sequence with conditional logic), custom code (Python/Node.js scripts calling APIs), or even spreadsheet-based approaches for small volumes. Clay simplifies the process significantly because waterfall logic is built into the platform, but it's not the only option. For volumes under 1,000 contacts per month, a Make workflow connecting 2-3 provider APIs is a cost-effective alternative.