Google reviews hold valuable customer insights, but extracting them at scale presents real challenges. The official Places API caps results at just 5 reviews per location, making it impractical for serious data collection. Browser automation offers a workaround, though Google actively detects and blocks scraping attempts.
This tutorial walks through building a Python scraper using Selenium and rotating proxies to gather complete review datasets. The combination of browser automation with proxy rotation mimics genuine user behavior while distributing requests across multiple IP addresses. That approach significantly reduces detection risk and enables large-scale data extraction.
Prerequisites
Before diving into the code, ensure these components are ready:
- Python 3.8+ installed on the system
- Selenium library via
pip install selenium - ChromeDriver matching the installed Chrome browser version
- Proxy service account with residential IP rotation (providers like Decodo, Oxylabs, or Bright Data work well)
- Basic Python knowledge and familiarity with web scraping concepts
Residential proxies deserve special attention here. Unlike datacenter IPs, residential proxies route traffic through real user devices, making requests appear more legitimate. Google’s detection systems flag datacenter IPs more aggressively, so residential rotation is worth the extra cost. For a detailed comparison, check out the best residential proxy providers.
Step 1: Configure Selenium with Proxy Rotation
Setting up Selenium to route traffic through rotating proxies forms the foundation of any successful scraping operation. Without proper proxy configuration, Google will quickly identify and block requests after just a few dozen queries.
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
def create_driver_with_proxy(proxy_address):
chrome_options = Options()
chrome_options.add_argument(f'--proxy-server={proxy_address}')
chrome_options.add_argument('--disable-blink-features=AutomationControlled')
chrome_options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')
driver = webdriver.Chrome(options=chrome_options)
return driver
Most premium proxy services provide a single endpoint that automatically rotates IPs with each session. The additional Chrome arguments help mask automated browser fingerprints that Google might otherwise detect.
Step 2: Navigate to the Google Maps Listing
With the browser configured, navigation to the target business page comes next. Google Maps URLs follow predictable patterns that include the place name and coordinates.
import time
import random
def navigate_to_business(driver, business_url):
driver.get(business_url)
time.sleep(random.uniform(2, 4))
The random delay between actions serves an important purpose. Real users don’t navigate pages with machine-like precision. Introducing randomized pauses of 2-4 seconds helps the scraper blend in with normal traffic patterns.
Step 3: Click the “All Reviews” Button
Google Maps displays only a handful of reviews on the main business page. Accessing the complete review list requires clicking the reviews section, which opens a scrollable panel containing all submitted feedback.
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def open_reviews_panel(driver):
reviews_button = WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "button[jsaction*='reviews']"))
)
reviews_button.click()
time.sleep(random.uniform(1, 2))
Keep in mind that Google frequently changes class names and element structures. Maintaining a scraper means periodically updating these selectors when they break.
Step 4: Infinite Scroll to Load All Reviews
Google implements lazy loading for reviews, meaning only a subset appears initially. The remaining reviews load dynamically as users scroll down the panel. Automating this scroll behavior is essential for capturing complete datasets.
def scroll_reviews_panel(driver):
scrollable_div = driver.find_element(By.CSS_SELECTOR, ".m6QErb.DxyBCb.kA9KIf.dS8AEf")
last_height = 0
while True:
driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", scrollable_div)
time.sleep(random.uniform(1.5, 3))
new_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_div)
if new_height == last_height:
break
last_height = new_height
A note on CSS selectors: The class names like .m6QErb.DxyBCb appear random because Google generates them automatically through their build process (a technique called CSS obfuscation). These aren’t meaningful names chosen by developers. To find the current selectors, right-click any element in Chrome, select “Inspect,” and examine the HTML structure. Google changes these class names periodically, so expect to update them when the scraper stops working.
The loop continues scrolling until no new content loads, indicated when the scroll height stops increasing. For businesses with thousands of reviews, this process can take several minutes.
Step 5: Extract Review Data
Once all reviews have loaded into the DOM, parsing the HTML to extract relevant fields becomes straightforward. Each review contains the author name, star rating, review text, and posting date.
def extract_reviews(driver):
reviews = []
review_elements = driver.find_elements(By.CSS_SELECTOR, ".jftiEf")
for element in review_elements:
try:
author = element.find_element(By.CSS_SELECTOR, ".d4r55").text
rating = element.find_element(By.CSS_SELECTOR, ".kvMYJc").get_attribute("aria-label")
text = element.find_element(By.CSS_SELECTOR, ".wiI7pd").text
date = element.find_element(By.CSS_SELECTOR, ".rsqaWe").text
reviews.append({
"author": author,
"rating": rating,
"text": text,
"date": date
})
except Exception:
continue
return reviews
The try-except block handles cases where individual review elements might be missing certain fields. Some reviewers leave ratings without text, so wrapping extraction in exception handling prevents a single malformed review from crashing the entire scrape.
Step 6: Export to CSV or JSON
The final step involves saving extracted data to a usable format. Both CSV and JSON work well depending on downstream analysis needs.
import csv
import json
def export_to_csv(reviews, filename):
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=['author', 'rating', 'text', 'date'])
writer.writeheader()
writer.writerows(reviews)
def export_to_json(reviews, filename):
with open(filename, 'w', encoding='utf-8') as f:
json.dump(reviews, f, ensure_ascii=False, indent=2)
CSV files open easily in spreadsheet applications for quick analysis. JSON preserves data structure better for programmatic processing.
Pro Tips for Avoiding Detection
Even with proxies and Selenium, Google can still detect and block scrapers. These additional measures improve success rates:
- Randomize delays between all actions, not just page loads. Vary scroll speeds, click timing, and navigation patterns.
- Rotate user agents alongside proxy IPs. A single user-agent string appearing from hundreds of different IPs looks suspicious.
- Run in non-headless mode when possible. Headless browsers have detectable fingerprints that Google’s systems recognize.
- Limit request volume to reasonable levels. Scraping 50 businesses per hour raises fewer flags than 500 per hour.
- Handle CAPTCHAs gracefully. When they appear, either solve them manually or integrate a CAPTCHA-solving service.
For large-scale operations, consider using anti-detection browser frameworks like Playwright with stealth plugins. These tools specifically target the fingerprinting methods Google employs. For more techniques on bypassing anti-bot systems, including Akamai and similar protections, additional strategies can help improve success rates.
Conclusion
Building a Google reviews scraper with Python and Selenium requires balancing technical implementation with evasion tactics. The code samples provided here offer a functional starting point, though real-world deployment demands ongoing maintenance as Google updates their defenses.
For those who prefer avoiding technical complexity, third-party scraping APIs handle these challenges automatically. Services like Outscraper or ZenRows abstract away proxy management, CAPTCHA solving, and selector maintenance. The tradeoff is cost, but for many projects the time savings justify the expense.
Whichever path makes sense, remember that scraping Google reviews exists in a legal gray area. Their Terms of Service prohibit automated access, even though the data itself is public. Use collected data responsibly, respect rate limits, and consider the ethical implications of large-scale data collection.
Jason Moth
Related posts
Popular Articles
Best Linux Distros for Developers and Programmers as of 2025
Linux might not be the preferred operating system of most regular users, but it’s definitely the go-to choice for the majority of developers and programmers. While other operating systems can also get the job done pretty well, Linux is a more specialized OS that was…
How to Install Pip on Ubuntu Linux
If you are a fan of using Python programming language, you can make your life easier by using Python Pip. It is a package management utility that allows you to install and manage Python software packages easily. Ubuntu doesn’t come with pre-installed Pip, but here…
