Blog Post
How to Scrape Google Reviews with Python, Selenium, and Proxies
Proxies, Tutorials

How to Scrape Google Reviews with Python, Selenium, and Proxies

Google reviews hold valuable customer insights, but extracting them at scale presents real challenges. The official Places API caps results at just 5 reviews per location, making it impractical for serious data collection. Browser automation offers a workaround, though Google actively detects and blocks scraping attempts.

This tutorial walks through building a Python scraper using Selenium and rotating proxies to gather complete review datasets. The combination of browser automation with proxy rotation mimics genuine user behavior while distributing requests across multiple IP addresses. That approach significantly reduces detection risk and enables large-scale data extraction.

Prerequisites

Before diving into the code, ensure these components are ready:

  • Python 3.8+ installed on the system
  • Selenium library via pip install selenium
  • ChromeDriver matching the installed Chrome browser version
  • Proxy service account with residential IP rotation (providers like Decodo, Oxylabs, or Bright Data work well)
  • Basic Python knowledge and familiarity with web scraping concepts

Residential proxies deserve special attention here. Unlike datacenter IPs, residential proxies route traffic through real user devices, making requests appear more legitimate. Google’s detection systems flag datacenter IPs more aggressively, so residential rotation is worth the extra cost. For a detailed comparison, check out the best residential proxy providers.

Step 1: Configure Selenium with Proxy Rotation

Setting up Selenium to route traffic through rotating proxies forms the foundation of any successful scraping operation. Without proper proxy configuration, Google will quickly identify and block requests after just a few dozen queries.

from selenium import webdriver
from selenium.webdriver.chrome.options import Options

def create_driver_with_proxy(proxy_address):
    chrome_options = Options()
    chrome_options.add_argument(f'--proxy-server={proxy_address}')
    chrome_options.add_argument('--disable-blink-features=AutomationControlled')
    chrome_options.add_argument('--user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')

    driver = webdriver.Chrome(options=chrome_options)
    return driver

Most premium proxy services provide a single endpoint that automatically rotates IPs with each session. The additional Chrome arguments help mask automated browser fingerprints that Google might otherwise detect.

Step 2: Navigate to the Google Maps Listing

With the browser configured, navigation to the target business page comes next. Google Maps URLs follow predictable patterns that include the place name and coordinates.

import time
import random

def navigate_to_business(driver, business_url):
    driver.get(business_url)
    time.sleep(random.uniform(2, 4))

The random delay between actions serves an important purpose. Real users don’t navigate pages with machine-like precision. Introducing randomized pauses of 2-4 seconds helps the scraper blend in with normal traffic patterns.

Step 3: Click the “All Reviews” Button

Google Maps displays only a handful of reviews on the main business page. Accessing the complete review list requires clicking the reviews section, which opens a scrollable panel containing all submitted feedback.

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

def open_reviews_panel(driver):
    reviews_button = WebDriverWait(driver, 10).until(
        EC.element_to_be_clickable((By.CSS_SELECTOR, "button[jsaction*='reviews']"))
    )
    reviews_button.click()
    time.sleep(random.uniform(1, 2))

Keep in mind that Google frequently changes class names and element structures. Maintaining a scraper means periodically updating these selectors when they break.

Step 4: Infinite Scroll to Load All Reviews

Google implements lazy loading for reviews, meaning only a subset appears initially. The remaining reviews load dynamically as users scroll down the panel. Automating this scroll behavior is essential for capturing complete datasets.

def scroll_reviews_panel(driver):
    scrollable_div = driver.find_element(By.CSS_SELECTOR, ".m6QErb.DxyBCb.kA9KIf.dS8AEf")
    last_height = 0

    while True:
        driver.execute_script("arguments[0].scrollTop = arguments[0].scrollHeight", scrollable_div)
        time.sleep(random.uniform(1.5, 3))

        new_height = driver.execute_script("return arguments[0].scrollHeight", scrollable_div)
        if new_height == last_height:
            break
        last_height = new_height

A note on CSS selectors: The class names like .m6QErb.DxyBCb appear random because Google generates them automatically through their build process (a technique called CSS obfuscation). These aren’t meaningful names chosen by developers. To find the current selectors, right-click any element in Chrome, select “Inspect,” and examine the HTML structure. Google changes these class names periodically, so expect to update them when the scraper stops working.

The loop continues scrolling until no new content loads, indicated when the scroll height stops increasing. For businesses with thousands of reviews, this process can take several minutes.

Step 5: Extract Review Data

Once all reviews have loaded into the DOM, parsing the HTML to extract relevant fields becomes straightforward. Each review contains the author name, star rating, review text, and posting date.

def extract_reviews(driver):
    reviews = []
    review_elements = driver.find_elements(By.CSS_SELECTOR, ".jftiEf")

    for element in review_elements:
        try:
            author = element.find_element(By.CSS_SELECTOR, ".d4r55").text
            rating = element.find_element(By.CSS_SELECTOR, ".kvMYJc").get_attribute("aria-label")
            text = element.find_element(By.CSS_SELECTOR, ".wiI7pd").text
            date = element.find_element(By.CSS_SELECTOR, ".rsqaWe").text

            reviews.append({
                "author": author,
                "rating": rating,
                "text": text,
                "date": date
            })
        except Exception:
            continue

    return reviews

The try-except block handles cases where individual review elements might be missing certain fields. Some reviewers leave ratings without text, so wrapping extraction in exception handling prevents a single malformed review from crashing the entire scrape.

Step 6: Export to CSV or JSON

The final step involves saving extracted data to a usable format. Both CSV and JSON work well depending on downstream analysis needs.

import csv
import json

def export_to_csv(reviews, filename):
    with open(filename, 'w', newline='', encoding='utf-8') as f:
        writer = csv.DictWriter(f, fieldnames=['author', 'rating', 'text', 'date'])
        writer.writeheader()
        writer.writerows(reviews)

def export_to_json(reviews, filename):
    with open(filename, 'w', encoding='utf-8') as f:
        json.dump(reviews, f, ensure_ascii=False, indent=2)

CSV files open easily in spreadsheet applications for quick analysis. JSON preserves data structure better for programmatic processing.

Pro Tips for Avoiding Detection

Even with proxies and Selenium, Google can still detect and block scrapers. These additional measures improve success rates:

  • Randomize delays between all actions, not just page loads. Vary scroll speeds, click timing, and navigation patterns.
  • Rotate user agents alongside proxy IPs. A single user-agent string appearing from hundreds of different IPs looks suspicious.
  • Run in non-headless mode when possible. Headless browsers have detectable fingerprints that Google’s systems recognize.
  • Limit request volume to reasonable levels. Scraping 50 businesses per hour raises fewer flags than 500 per hour.
  • Handle CAPTCHAs gracefully. When they appear, either solve them manually or integrate a CAPTCHA-solving service.

For large-scale operations, consider using anti-detection browser frameworks like Playwright with stealth plugins. These tools specifically target the fingerprinting methods Google employs. For more techniques on bypassing anti-bot systems, including Akamai and similar protections, additional strategies can help improve success rates.

Conclusion

Building a Google reviews scraper with Python and Selenium requires balancing technical implementation with evasion tactics. The code samples provided here offer a functional starting point, though real-world deployment demands ongoing maintenance as Google updates their defenses.

For those who prefer avoiding technical complexity, third-party scraping APIs handle these challenges automatically. Services like Outscraper or ZenRows abstract away proxy management, CAPTCHA solving, and selector maintenance. The tradeoff is cost, but for many projects the time savings justify the expense.

Whichever path makes sense, remember that scraping Google reviews exists in a legal gray area. Their Terms of Service prohibit automated access, even though the data itself is public. Use collected data responsibly, respect rate limits, and consider the ethical implications of large-scale data collection.

Related posts

Leave a Reply

Required fields are marked *

Copyright © 2025 Blackdown.org. All rights reserved.