All Articles
Enrichment16 min read

Data Waterfall Architecture: How to Build Multi-Provider Enrichment Pipelines

Learn how data waterfall enrichment pipelines combine multiple providers to achieve 80%+ coverage - with architecture patterns, Clay implementation, and cost optimization.

Data Waterfall Architecture: How to Build Multi-Provider Enrichment Pipelines

A data waterfall (also called an enrichment waterfall or data cascade) is an architecture pattern where multiple data providers are queried in sequence - each one filling in gaps left by the previous provider - to maximize data coverage and accuracy. Instead of relying on a single enrichment source, the waterfall routes each data request through a prioritized chain of providers, stopping as soon as a valid result is found.

This pattern exists because no single B2B data provider covers the entire market. ZoomInfo might find 60% of emails in your list, Apollo might find a different 55%, and a third provider finds another 50%. But the overlap isn't 100% - each provider has pockets of data the others miss. A waterfall architecture combines their strengths and achieves 75-90% total coverage where any single provider tops out at 40-65%.

Why Single-Provider Enrichment Fails

Here's the reality of B2B data coverage that most vendors won't tell you:

The Coverage Gap

We ran an analysis across 50,000 B2B contacts targeted at mid-market SaaS companies (50-500 employees) in North America. Here's what each provider returned when used alone:

Provider: Provider A (Premium) | Email Found Rate: 62% | Email Verified Rate: 54% | Phone Found Rate: 38% | Title Accuracy: 89%

Provider: Provider B (Mid-tier) | Email Found Rate: 51% | Email Verified Rate: 44% | Phone Found Rate: 29% | Title Accuracy: 82%

Provider: Provider C (Budget) | Email Found Rate: 43% | Email Verified Rate: 35% | Phone Found Rate: 21% | Title Accuracy: 76%

Provider: 3-Provider Waterfall | Email Found Rate: 83% | Email Verified Rate: 74% | Phone Found Rate: 52% | Title Accuracy: 91%

The waterfall didn't just add the numbers together - there's overlap between providers. But the unique contribution of each additional provider was significant:

  • Provider A alone: 62% email coverage
  • Adding Provider B: +14% incremental (providers B found 14% of emails that A missed)
  • Adding Provider C: +7% incremental (C found 7% that both A and B missed)
  • Total: 83% coverage

That incremental 21% represents thousands of prospects you'd never reach with a single provider.

Why Providers Miss Different Data

Each data provider builds their database differently:

  • Web scraping providers excel at public data (job titles from LinkedIn, company info from websites) but miss private contact details
  • Intent-based providers are strong on active buyers but weak on dormant segments
  • Community-driven providers (like Apollo's contributor network) have better data on startups and tech companies but gaps in traditional industries
  • Premium providers (like ZoomInfo) have strong direct dial and organizational data for enterprise companies but thinner coverage for SMBs
  • Email verification providers don't find new emails but dramatically improve accuracy of emails found by other providers

Waterfall Architecture Patterns

Pattern 1: Sequential Waterfall (Most Common)

The simplest and most cost-effective pattern. Query providers one at a time, in priority order, stopping when you get a valid result.

`` Input: {name, company, domain} | v [Provider A] --> Found valid email? --> YES --> Stop, return result | NO | v [Provider B] --> Found valid email? --> YES --> Stop, return result | NO | v [Provider C] --> Found valid email? --> YES --> Stop, return result | NO | v Return: {enrichment_failed: true} ``

Pros: Lowest cost (you only pay for credits until a match is found). Simple to build and debug.

Cons: Slower (sequential API calls). Provider A's result is accepted without cross-checking.

Best for: High-volume enrichment where cost matters more than perfect accuracy.

Pattern 2: Parallel Waterfall with Consensus

Query all providers simultaneously and use a consensus mechanism to pick the best result.

`` Input: {name, company, domain} | v [Provider A] --| [Provider B] --|--> Consensus Engine --> Best result [Provider C] --| ``

The consensus engine applies rules like:

  • If 2+ providers return the same email, confidence = high
  • If only 1 provider returns an email, run it through verification
  • Use the provider with highest historical accuracy for title/seniority
  • Merge results (email from A, phone from B, title from C)

Pros: Highest accuracy. Multiple data points for cross-validation. Fastest (parallel calls).

Cons: Most expensive (every provider is called for every record). Higher complexity.

Best for: High-value ABM campaigns where data quality is paramount and volume is lower (under 5,000 contacts).

Pattern 3: Tiered Waterfall

Group providers by cost/quality tier. Start with the cheapest tier and escalate to more expensive providers only when cheap options fail.

`` Input: {name, company, domain} | v --- Tier 1: Free/cheap sources --- [Your database cache] [Free API endpoints] | Not found? | v --- Tier 2: Mid-cost providers --- [Apollo API] [Hunter.io] | Not found? | v --- Tier 3: Premium providers --- [ZoomInfo] [Cognism] | v [Email Verification] --> Final result ``

Pros: Optimizes cost by exhausting cheap sources first. Premium credits are only used when necessary.

Cons: Slower for records that require premium providers. Requires more complex orchestration.

Best for: Teams with budget constraints enriching large lists (10,000+ contacts per month).

Building a Waterfall in Clay

Clay is purpose-built for waterfall enrichment. Here's how to implement a production-grade pipeline.

Step 1: Set Up Your Input Table

Start with a Clay table containing your raw lead data. Minimum required fields:

  • First Name
  • Last Name
  • Company Name
  • Company Domain (if available)
  • LinkedIn URL (if available)

The more seed data you provide, the higher your match rates across all providers.

Step 2: Configure the Enrichment Waterfall

In Clay, you build waterfalls by adding enrichment columns in sequence. Each column can be configured to only run if the previous one didn't return a result.

Column 1: Find Work Email (Primary Provider)

  • Use Clay's built-in "Find Work Email" enrichment
  • This queries Clay's aggregated database (75+ providers) using a waterfall internally
  • Expected hit rate: 60-70%

Column 2: Find Work Email (Secondary - Apollo)

  • Add an Apollo People Enrichment column
  • Set the "Only run if" condition to: Column 1 is empty
  • This catches leads that Clay's primary waterfall missed
  • Expected incremental hit rate: 10-15%

Column 3: Find Work Email (Tertiary - Hunter.io)

  • Add a Hunter.io Email Finder column
  • Set the "Only run if" condition to: Column 1 AND Column 2 are empty
  • Pattern-based email finding as a last resort
  • Expected incremental hit rate: 5-8%

Column 4: Email Verification

  • Add a verification column (MillionVerifier, ZeroBounce, or NeverBounce)
  • Run this on ALL emails found, regardless of which provider returned them
  • Mark emails as: Valid / Risky / Invalid
  • Only pass "Valid" emails to your outbound tools

Column 5: Consolidated Email

  • Create a formula column that picks the final email:

- If Column 1 exists and Column 4 = Valid, use Column 1 - Else if Column 2 exists and Column 4 = Valid, use Column 2 - Else if Column 3 exists and Column 4 = Valid, use Column 3 - Else: mark as "No Valid Email Found"

Step 3: Add Firmographic Enrichment

Beyond email, enrich with company and contact data:

  • Company data: Employee count, revenue, funding, industry, tech stack
  • Contact data: Job title, seniority, department, LinkedIn URL
  • Intent signals: Recent hiring (from job postings), technology changes, funding rounds

Step 4: Score and Filter

Add scoring columns that evaluate enriched data against your ICP:

  • ICP Fit Score (based on company size, industry, funding, tech stack)
  • Contact Relevance Score (based on title, seniority, department)
  • Combined Score = weighted average

Filter out records below your threshold before pushing to outbound.

Step 5: Push to Destinations

Configure Clay's integrations to push enriched, scored, filtered leads to:

  • HubSpot - Create/update contacts with all enriched properties
  • Instantly/Smartlead - Add to outbound campaigns
  • Slack - Notify reps about high-score leads

Cost Optimization Strategies

Enrichment costs can spiral quickly. Here's how to keep them manageable.

1. Cache Everything

Build a local cache (or use Clay's deduplication) so you never pay to enrich the same person twice. At scale, 15-25% of your enrichment requests will be duplicates of previous lookups.

Savings: 15-25% reduction in enrichment costs.

2. Pre-Filter Before Enrichment

Don't enrich everyone. Apply cheap filters first:

  • Verify the company exists and matches your ICP (using free company data)
  • Check if the domain is a personal email domain (gmail.com, yahoo.com) - skip these
  • Deduplicate against your existing CRM database
  • Remove contacts already in active sequences

Savings: 20-40% reduction by avoiding enrichment on non-ICP records.

3. Tiered Provider Strategy

Not every lead deserves premium enrichment:

  • Tier 1 accounts (ABM targets): Use the full waterfall with premium providers. Spend $2-5 per contact for maximum coverage.
  • Tier 2 accounts (good fit): Use standard waterfall (2 providers + verification). Spend $0.50-1.50 per contact.
  • Tier 3 accounts (exploratory): Use single provider + verification. Spend $0.10-0.30 per contact.

Savings: 30-50% reduction vs. running premium enrichment on everything.

4. Negotiate Provider Contracts

Most enrichment providers offer volume discounts:

  • Apollo: Annual plans save 20-30% vs. monthly
  • ZoomInfo: Multi-year contracts can reduce per-credit cost by 40%
  • Clay: Enterprise plans include better credit rates
  • Bulk verification services (MillionVerifier): $0.0005-0.001 per email at high volume

5. Monitor Hit Rates by Provider

Track which providers are actually contributing unique results. If Provider C is only adding 2% incremental coverage, the cost might not justify keeping it in the waterfall.

Build a simple dashboard tracking:

  • Hit rate per provider (% of queries that return a result)
  • Incremental hit rate (% of results that no previous provider found)
  • Cost per successful enrichment per provider
  • Email verification pass rate per provider (data quality signal)

Real-World Waterfall Performance Data

Here are actual results from three Clay waterfall implementations we built for clients.

Client A: Series B SaaS (Targeting Mid-Market)

  • List size: 12,000 contacts/month
  • Target: VP+ at companies with 100-1,000 employees
  • Waterfall: Clay Built-in > Apollo > Hunter.io > MillionVerifier

Metric: Email found rate | Before (Apollo only): 52% | After (Waterfall): 81%

Metric: Email verified rate | Before (Apollo only): 44% | After (Waterfall): 72%

Metric: Valid emails per month | Before (Apollo only): 5,280 | After (Waterfall): 8,640

Metric: Cost per valid email | Before (Apollo only): $0.42 | After (Waterfall): $0.38

Metric: Meetings booked/month | Before (Apollo only): 18 | After (Waterfall): 34

The waterfall increased valid email output by 64% while actually reducing cost per valid email by 10% (because the incremental providers were cheaper than the primary).

Client B: Growth-Stage Agency (Targeting SMB)

  • List size: 25,000 contacts/month
  • Target: Founders and marketing leads at companies with 10-100 employees
  • Waterfall: Clay Built-in > Apollo > Dropcontact > NeverBounce

Metric: Email found rate | Before (Single provider): 47% | After (Waterfall): 78%

Metric: Email verified rate | Before (Single provider): 39% | After (Waterfall): 69%

Metric: Bounce rate on sends | Before (Single provider): 8.2% | After (Waterfall): 1.4%

Metric: Cost per valid email | Before (Single provider): $0.31 | After (Waterfall): $0.29

Client C: Enterprise Sales Team (Targeting F500)

  • List size: 3,000 contacts/month
  • Target: C-suite and VP at Fortune 500 companies
  • Waterfall: ZoomInfo > Clay Built-in > Cognism > ZeroBounce

Metric: Email found rate | Before (ZoomInfo only): 68% | After (Waterfall): 87%

Metric: Direct dial found rate | Before (ZoomInfo only): 41% | After (Waterfall): 58%

Metric: Email verified rate | Before (ZoomInfo only): 61% | After (Waterfall): 79%

Metric: Cost per valid email | Before (ZoomInfo only): $1.85 | After (Waterfall): $2.10

For enterprise targets, the waterfall cost slightly more per valid email but the incremental coverage justified it - those 19% additional contacts represented high-value enterprise prospects worth $50K+ in potential ACV.

Advanced Waterfall Techniques

Conditional Provider Selection

Instead of always running the same waterfall sequence, use conditional logic to pick providers based on the target:

  • Tech companies (SaaS, software): Apollo as primary (strongest tech company coverage)
  • Traditional industries (manufacturing, finance): ZoomInfo as primary (strongest enterprise coverage)
  • European contacts: Cognism as primary (GDPR-compliant, strong EU data)
  • Startups (under 50 employees): Clay + Apollo (best startup coverage)

Re-Enrichment Cycles

B2B data decays at 30-40% per year (people change jobs, companies pivot, phone numbers change). Build a re-enrichment cycle:

  • Monthly: Re-enrich contacts that bounced in outbound campaigns
  • Quarterly: Re-enrich your active pipeline and top accounts
  • Bi-annually: Full database re-enrichment

Waterfall for Phone Numbers

Email waterfalls get the most attention, but phone number coverage is even worse across providers. A phone-specific waterfall is critical for teams that rely on cold calling:

  1. ZoomInfo (best direct dial coverage for enterprise)
  2. Apollo (good coverage for tech/SaaS)
  3. Cognism (strong for mobile numbers, especially EU)
  4. Lusha (crowdsourced - good for verifying numbers found elsewhere)

Expected results: 25-35% direct dial coverage with a single provider, 45-60% with a 3-provider waterfall.

Building vs. Buying Waterfall Infrastructure

Build It Yourself (Custom Code)

When it makes sense: You have engineering resources, need complete customization, or process 100K+ contacts per month where Clay credits become expensive.

Tech stack: Python/Node.js script that calls provider APIs in sequence, stores results in a database, and handles rate limiting, retries, and error handling.

Time to build: 2-4 weeks for a basic waterfall. 6-8 weeks for a production-grade system with caching, monitoring, and a UI.

Ongoing maintenance: 5-10 hours/month for API changes, provider updates, and bug fixes.

Use Clay (No-Code/Low-Code)

When it makes sense: Your volume is under 50K contacts/month, you don't have dedicated engineering, or you need to iterate quickly on waterfall logic.

Time to build: 2-4 hours for a basic waterfall. 1-2 days for a production-grade pipeline with scoring and routing.

Ongoing maintenance: 1-2 hours/month.

Cost: $349-1,000+/month depending on volume and plan.

Comparison

Factor: Setup time | Custom Build: 2-8 weeks | Clay: 2-8 hours

Factor: Monthly maintenance | Custom Build: 5-10 hours | Clay: 1-2 hours

Factor: Cost at 10K contacts/mo | Custom Build: $200-500 (API costs) | Clay: $349-500

Factor: Cost at 50K contacts/mo | Custom Build: $800-2,000 (API costs) | Clay: $1,000-2,500

Factor: Cost at 200K contacts/mo | Custom Build: $2,000-5,000 (API costs) | Clay: $3,000-8,000+

Factor: Flexibility | Custom Build: Unlimited | Clay: High (within Clay's framework)

Factor: Engineering required | Custom Build: Yes | Clay: No

Factor: Time to iterate | Custom Build: Hours-days | Clay: Minutes

For most B2B teams, Clay is the right choice until you're processing 100K+ contacts per month or need customization that Clay can't handle.

FAQ

What is a data waterfall in B2B enrichment?

A data waterfall is an architecture where multiple data providers are queried in sequence to maximize coverage. Instead of relying on one provider that might only find 50-60% of the data you need, the waterfall tries Provider A first, then Provider B for any gaps, then Provider C, and so on. This typically improves total data coverage from 40-65% (single provider) to 75-90% (multi-provider waterfall).

How many providers should be in a waterfall?

Three to four providers is the sweet spot for most B2B teams. The first provider typically covers 50-65% of records, the second adds 10-15% incremental, and the third adds 5-8% more. A fourth provider rarely adds more than 3-5% incremental coverage, and the complexity and cost usually aren't worth it beyond that. Always include a verification provider as the final step regardless of how many finders you use.

Does a waterfall cost more than using a single provider?

Not necessarily. In a sequential waterfall, you only pay for additional providers when the first one fails. If your primary provider has a 60% hit rate, you're only paying for the second provider on 40% of lookups, and the third on perhaps 25%. Many teams find that the cost per valid email actually decreases with a waterfall because the incremental providers are cheaper per lookup. The total budget increases because you're finding more valid contacts, but the unit economics improve.

How do I choose which provider to query first in the waterfall?

Prioritize by: (1) data quality - the provider with the highest verified accuracy rate should go first, (2) coverage for your specific ICP - if you target tech companies, a provider strong in tech data goes first, (3) cost - among providers with similar quality, put the cheaper one first. Run a benchmark test with 1,000 contacts from your target market across all providers to establish baseline hit rates before finalizing your waterfall order.

Can I build a waterfall without Clay?

Yes. You can build waterfalls using Make or n8n (connecting provider APIs in sequence with conditional logic), custom code (Python/Node.js scripts calling APIs), or even spreadsheet-based approaches for small volumes. Clay simplifies the process significantly because waterfall logic is built into the platform, but it's not the only option. For volumes under 1,000 contacts per month, a Make workflow connecting 2-3 provider APIs is a cost-effective alternative.

Need help implementing this?

GTME builds the systems described in this article. Book a call and we'll show you what it looks like for your business.

Book a Strategy Call

GTM insights, weekly

Get articles like this in your inbox every week. No fluff.

Want us to build this for you?

Every article we write is based on systems we've built for real clients. Let's build yours.