So you want to scrape like the big boys (2021)

#53 · 🔥 318 · 💬 160 · 11 days ago · incolumitas.com · aragonite · 📷
When I used to run a scraping service, I managed to scrape at most a couple of million Google SERPs per week. So how did I manage to scrape millions of Google SERP's? It will work for scraping Google / Bing / Amazon, because they want to be scraped to a certain extent. The main reason that makes bots prone to detection is simple economics: In order to scrape millions of pages, bot programmers put their browsers into docker containers and orchestrate them with docker swarm. What sane human being is browsing Instagram from within a docker container on a Hetzner VPS?! Let's propose a scraping architecture that is not that easily detectable. We want diversity after all for fingerprinting reasons! You can also buy old Android devices. For each scraping server, run 50-100 emulated Android devices.
So you want to scrape like the big boys (2021)



Send Feedback | WebAssembly Version (beta)