Yelp Full vs Yelp Full Chrome
Yelp has decided on building out a new site design which means we we needed to adjust the scraper for the new site. Normally this is a simple update but this time Yelp has changed their site so the pages now require Javascript support to view the page.
What this means is that our old system will no longer be able to fully load the pages. The old system of requesting the pages does not support Javascript. Only a browser can render the Javascript. Thankfully we can still get some data using the old system but not full data, to get the full data we need to open the page in a browser. This is why we now have 2 systems for Yelp Full. “Yelp Full” is the older system with no browser that will get the majority of data, “Yelp Full Chrome” is the new system that will load the pages one by one in Chrome to fully render the page and get full details.
Yelp Full will get Name, Address, City, State, Zip, Description, Owner, Reviews, Rating, Phone, Website, Category, Lat, Long, Listing URL. It will not get the hours, amenities, price range, or claimed status.
This scraper is still multi-threaded and is the fastest option.
Yelp Full Chrome will get the above details but also the business hours, claimed/unclaimed, price range, and all of the amenities. These are the “allows dogs”, “has wifi”, “takes reservations”, “good for kids”, etc.
If you need the amenities data, claimed status, or business hours you have to use the new Yelp Full Chrome. This scraper is not multi-threaded and will only use a single Chrome browser opening pages one-by-one. It is much slower than the other Yelp Full. If you do not need this extra data use the old Yelp Full.
Both systems support proxies, including StormProxies which was recommend on the readme page. If you are going to scrape from Yelp using Yelp Full or Yelp Full Chrome you NEED to use a back connect rotating service. These services give you access to tens of thousands of IP address and change your IP each time you connect to them. Yelp will eventually catch and ban your proxies if you are using normal shared proxies. You need as many proxies as possible to scrape them.