Lead scraper with batch processing

1. Lead scraper with batch processing

Welcome back!

2. From feedback loops to batch processing

In the previous chapter, you built a self-improving conversion rate optimizer using feedback loops. Now we're going to tackle a different challenge: processing multiple items at scale using batch loops and parallel data streams.

3. The lead generation problem

Here's a common marketing scenario. You need to find potential customers, let's say plumbers in Miami. You could manually search Google Maps, visit each website, and hunt for contact emails. Or you could build a workflow that does all of this automatically, processing dozens of businesses in minutes.

4. External API integration

This workflow starts by connecting to an external service called Apify. Instead of using a dedicated node, you'll call their REST API directly using HTTP Request nodes. This is a transferable skill: the same pattern works for any API. You'll send a search query and location, and Apify returns a list of businesses from Google Maps with their names, addresses, and websites.

5. Batch processing with Split In Batches

Once you have the website URLs, you need to visit each one and extract emails. This is where Split In Batches comes in. It processes items one at a time in a loop: scrape a website, wait one second to avoid rate limits, extract emails using regex, then move to the next URL. The loop continues until all websites have been processed.

6. Parallel data streams

After the loop completes, data flows into two parallel streams, like an assembly line splitting into two conveyor belts. Stream A processes the business details coming from Google Maps: names, addresses, and phone numbers. Stream B processes the extracted emails from the websites: filtering empties, splitting arrays, and removing duplicates. Each stream aggregates its data into a single item.

7. Merging and final output

Finally, both streams merge back together. A Code node matches emails to businesses by domain, and formats everything into clean JSON. The result? A complete lead list with company names, addresses, phone numbers, websites, and contact emails, all generated automatically from a simple search query!

8. New patterns you'll learn

By the end of this chapter, you'll know how to integrate an external API using HTTP nodes. You'll apply batch processing with Split In Batches for handling multiple items. And you'll build parallel data streams: splitting, processing, and merging data flows. These patterns are essential for any data enrichment or scraping workflow. Let me walk you through what you'll build in the first few exercises.

9. What you'll build first

You'll start with a Form Trigger that collects a search query and location. Then you'll call the Apify API using two HTTP nodes: one to launch the Google Maps scraper, and another to fetch the results. All the API URLs and headers are in the exercise instructions. Once you have the business listings, you'll clean the data: a Set node extracts just the website URLs, a Remove Duplicates node eliminates repeats, and a Limit node caps the list at 10. This gives you a clean set of URLs ready for scraping. Then comes the scraping loop. You'll use a Split In Batches node to process each URL one at a time: fetch the page, wait one second for rate limiting, and extract emails using a regex Code node. The loop runs until every website has been processed. As always, the exercise instructions contain all the code, API configurations, and expressions you need.

10. Let's practice!

Let's get started!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.