Local Scraper 4.154 – Yelp and Google Updates

Local Scraper has been updated! I am proud to announce that we are back up and running. So lets get started.

Yelp

The other day Yelp decided to hire Distil Networks for anti scraper protection. This broke our old scraper and has made us change how we do things. So here is what we are doing about it.

Your program will now have Yelp Quick and Yelp Full. Yelp Quick is our new high speed scraper. It gets less data because it pulls from the search page instead of the listing page. Yelp Quick grabs 9 points of data listed below. This scraper is super fast and probably not require you to use proxies or the captcha services. No guarantee though. This will highly depend on the level of scraping you will be doing.

  • Name
  • Address
  • Phone
  • Rating Number
  • Review Number
  • Price Range
  • Category
  • Description Snippet
  • Image URL
  • Listing URL

Yelp Full gets all of the details that our original scraper got. What has changed is how we do it and our limits. Yelp Full is single threaded and can not be multithreaded. It also will require you to use proxies and solve captcha’s to complete a scrape. The scraper will not make it past 100 listings without using proxies. The more you have the better. With 8 IPs I was able to scrape 1000 results with 10-20 captchas. I would recommend the 25 IP address package that Proxy Bonanza sells. It’s $10 a month and will keep you up and running.

So with Yelp Full you will get our original points of data but you will be required to use proxies and you will need to solve captchas. Don’t blame me blame Yelp for hiring Distil. The good news is that this can all mostly be automated so you can just let the program run.

I have made a new blog post covering how to use 2Captcha and how to setup it up with my scrapers. Click here to see the blog post about it.

Using proxies is covered on the ReadMe page for all products. Nothing changed with this, just that you are required now to use them.

Google

Google recently also made changes and the scraper was updated to resolve that also. Seems they no longer were going to support the browser we were using. So we changed the browser used to a new one. What’s different is that this browser is hidden so you will no longer get to see it in action like you used to. You do not need the captcha’s or proxies but I recommend the proxies if your planning a large amount of scraping.

If the new browser is not working for you try using the “Use Chrome” option. This option will use Google Chrome as the browser and should work much better. If you don’t want to see it on your screen click the Hide Browser option as well. To use these setting you will need normal 32bit version of Google Chrome installed.