G2 Company Scraper
G2 Company Scraper extracts software company leads directly from G2 category pages — giving sales teams, agencies, and researchers structured lists of companies with ratings, review counts, employee size, headquarters, and pricing tier. Point it at any G2 category slug and get a clean, deduplicated dataset ready for outreach or CRM import.
Maintenance Pulse
90/100Documentation
G2 Company Scraper extracts software company leads directly from G2 category pages — giving sales teams, agencies, and researchers structured lists of companies with ratings, review counts, employee size, headquarters, and pricing tier. Point it at any G2 category slug and get a clean, deduplicated dataset ready for outreach or CRM import.
Built on CheerioCrawler with session pooling and residential proxy support, the actor pages through G2 category listings automatically, applies quality filters on the fly, and stops the moment your spending limit is reached. No code required: configure categories in the UI, hit Start, and download results as JSON, CSV, or Excel.
What data can you extract?
| Data Point | Source | Example |
|---|---|---|
| 🏢 Company Name | G2 product listing card | Salesforce |
| 🌐 Website URL | G2 listing — productListingWebsite | https://salesforce.com |
| 🔗 Domain | Parsed from website URL | salesforce.com |
| 🔗 G2 Profile URL | G2 listing — productListingLink | https://www.g2.com/products/salesforce-crm |
| ⭐ G2 Rating | productListingRating + ratingValue itemprop | 4.4 |
| 💬 Review Count | productListingReviews + reviewCount itemprop | 23,451 |
| 👥 Employee Count | productListingEmployeeCount + company_size key | 1001+ |
| 📍 Headquarters | productListingHeadquarters + hq_location key | San Francisco, CA |
| 💰 Pricing Tier | productListingPricing + starting_price key | Freemium |
| 🏷️ Categories | productListingCategory tags | ["crm", "sales-force-automation"] |
| 📝 Description | productListingDescription text | AI-powered CRM for enterprise sales teams |
| 📂 Source Category | Category slug used to find this record | crm |
Why use G2 Company Scraper?
Building a list of software vendors in a given category by hand means clicking through page after page on G2, copying names into a spreadsheet, looking up websites separately, and manually recording ratings. For 50 companies that takes a skilled researcher 2-3 hours. For 200 companies across 5 categories, you are looking at a full day of tedious work — and the data goes stale within weeks.
This actor automates the entire process. Supply a list of G2 category slugs, set a minimum review threshold to filter out thin listings, and the actor pages through every result automatically, deduplicates companies that appear in multiple categories, and delivers a structured dataset in minutes.
- Scheduling — run weekly or monthly to keep your G2 lead lists fresh as new vendors enter each category
- API access — trigger runs from Python, JavaScript, or any HTTP client and pipe results directly into your CRM or data warehouse
- Proxy rotation — G2 aggressively fingerprints and blocks datacenter IPs; the actor defaults to Apify's residential proxy pool so requests look like real user traffic
- Monitoring — configure Slack or email alerts when runs fail or return zero results
- Integrations — connect to Zapier, Make, Google Sheets, HubSpot, or webhooks to automate downstream workflows
Features
- CheerioCrawler with session pooling — uses
persistCookiesPerSession: trueso each session maintains state across requests, reducing block rates on G2's fingerprinting layer - Dual-selector extraction strategy — every field tries the primary
data-testidselector first, then falls back to class-based anditempropselectors, so the actor keeps working through minor G2 markup changes - Automatic pagination — after each category page is processed, the actor enqueues the next page (
?page=N) until the per-category limit is reached or no more product cards are found - Per-category limits with global deduplication —
maxCompaniesPerCategorystops collection per category independently; cross-category deduplication by parsed domain prevents the same vendor appearing twice - Quality filters on ingestion —
minReviewsandminRatingfilters are applied during extraction, not post-processing, so you only pay for companies that pass your criteria - Employee size filter with OR logic — supply multiple size ranges (e.g.
["1-50", "51-1000"]) and the actor includes companies matching any of them - Pricing tier normalization — raw G2 pricing strings are mapped to clean labels:
Freemium,Free,Contact Vendor, or the raw text when no mapping applies - Domain extraction and normalization — website URLs are parsed to extract the registrable domain (e.g.
salesforce.com), strippingwww.and subpaths, ready for deduplication or enrichment lookups - PPE-safe data ordering — data is pushed to the dataset before the PPE charge event fires, so you never pay for a record that was not saved
- Spending limit enforcement — when
Actor.charge()returnseventChargeLimitReached, all category loops stop immediately and the actor exits cleanly - Low concurrency by design —
maxConcurrency: 2prevents G2's rate-limiting heuristics from triggering; each request retries up to 3 times with session rotation - Run summary record — a
type: "summary"record is appended to the dataset at the end of every run, showing total companies found per category and overall deduplication count
Use cases for G2 company scraping
Sales prospecting and SDR list building
Sales development reps at software companies need targeted lists of vendors in adjacent or competitive categories. Instead of buying a static list from a data broker, an SDR can scrape the G2 "marketing-automation" or "sales-engagement" categories weekly, filter for companies with at least 50 reviews and a rating above 4.0, and feed results directly into their sales engagement tool. The website field links directly to the company's homepage for contact scraping in the next step.
Marketing agency new business development
Digital agencies looking for new software clients can scrape categories relevant to their service offering — for example, a PPC agency scraping "ppc-management" or "advertising-networks" to build a list of software companies actively investing in paid media. The employee count and pricing tier fields help agencies qualify prospects by company size and budget signal before spending time on outreach.
Competitive intelligence and market mapping
Product managers and strategy teams can pull a full category like "crm" or "project-management" to map every active player, track review velocity over time by scheduling repeat runs, and monitor new entrants as they appear on G2. Comparing two successive dataset snapshots reveals which vendors are gaining or losing reviews — a leading indicator of market momentum.
Recruiting and talent sourcing
Recruiters sourcing candidates from the software industry can use G2 category pages as a company discovery tool. Scraping "human-resources" or "applicant-tracking-systems" returns a list of HR tech companies with their HQ location and employee size — useful for identifying companies in a hiring phase or in the right geography for candidate placement.
Investor deal flow and portfolio monitoring
Venture and growth investors track emerging software categories for deal flow. Scraping a category like "ai-writing-assistant" or "generative-ai" with a low minimum review threshold captures early-stage companies before they appear in traditional databases. Scheduling monthly runs creates a longitudinal view of category growth and validates market size assumptions.
Technology partner and integration discovery
Partnerships teams looking for integration partners can scrape categories adjacent to their product (e.g. a CRM company scraping "electronic-signature" or "contract-management") to identify vendors with high review counts and a complementary pricing tier. The G2 profile URL links directly to each vendor's full listing for deeper research.
How to scrape G2 company listings
- Enter your G2 category slugs — find the slug in any G2 URL:
g2.com/categories/{slug}. For example,crm,email-marketing,project-management. Add one or more slugs to the Categories field. - Set your quality filters — enter a minimum review count (e.g. 25) to skip thin listings, and a minimum rating (e.g. 3.5) to focus on well-regarded products. Leave both at 0 to get everything.
- Run the actor — click Start. For 50 companies across 2 categories, expect a typical run to complete in 3-8 minutes depending on G2 response times and proxy latency.
- Download results — go to the Dataset tab and export as JSON, CSV, or Excel. Every record includes the company name, website, domain, rating, review count, employee size, HQ, pricing tier, categories, and a timestamp.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
categories | array | Yes | ["crm"] | G2 category slugs to scrape (e.g. crm, email-marketing, project-management) |
maxCompaniesPerCategory | integer | No | 50 | Max companies collected per category. Set to 0 for no limit (may run for a long time). Max: 1000 |
minReviews | integer | No | 0 | Minimum G2 review count. Companies below this threshold are skipped and not charged |
minRating | number | No | 0 | Minimum G2 average rating (1.0–5.0). Companies below this are skipped and not charged |
employeeSizeFilter | array | No | [] | Employee size ranges to include (e.g. ["1-50", "51-1000"]). OR logic — matches any. Empty = all sizes |
deduplicateByDomain | boolean | No | true | Skip companies whose domain was already seen in a previous category in this run |
proxyConfiguration | object | No | Residential | Proxy settings. G2 blocks datacenter IPs — residential proxies required for reliable results |
Input examples
Standard: scrape two categories, filter for established products:
{
"categories": ["crm", "email-marketing"],
"maxCompaniesPerCategory": 50,
"minReviews": 25,
"minRating": 3.5,
"deduplicateByDomain": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
Batch research: five categories, mid-market company size focus:
{
"categories": [
"project-management",
"accounting",
"marketing-automation",
"hr-management-suites",
"helpdesk"
],
"maxCompaniesPerCategory": 100,
"minReviews": 10,
"minRating": 0,
"employeeSizeFilter": ["51-1000"],
"deduplicateByDomain": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
Quick test: one category, top 10 only:
{
"categories": ["crm"],
"maxCompaniesPerCategory": 10,
"minReviews": 0,
"minRating": 0,
"deduplicateByDomain": false,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}
Input tips
- Find category slugs from the G2 URL — navigate to any G2 category page and copy the slug from the URL path:
g2.com/categories/crm→ slug iscrm. The slug must be lowercase and hyphenated. - Set minReviews to filter noise — newly listed software often has 0-5 reviews. Setting
minReviews: 10or higher removes thin listings and focuses your dataset on established products. - Use residential proxies — G2 blocks datacenter IP ranges. The default proxy config uses Apify Residential proxies. Without this, most requests will return 403 or empty pages.
- Batch categories in one run — running 5 categories in a single run is faster and cheaper than 5 separate single-category runs, because session warm-up overhead is paid once.
- Set a spending limit before running large batches — configure a maximum spend per run in the Apify console to cap costs automatically. The actor stops cleanly when the limit is reached.
Output example
{
"companyName": "HubSpot",
"website": "https://www.hubspot.com",
"domain": "hubspot.com",
"g2ProfileUrl": "https://www.g2.com/products/hubspot-crm/reviews",
"rating": 4.4,
"reviewCount": 12847,
"employeeCount": "1001+",
"headquarters": "Cambridge, MA",
"pricingTier": "Freemium",
"categories": ["crm", "marketing-automation", "sales-force-automation"],
"description": "HubSpot CRM platform gives your sales team everything they need to be more productive, maintain pipeline visibility, and grow revenue.",
"sourceCategory": "crm",
"scrapedAt": "2026-03-22T09:14:32.418Z"
}
A type: "summary" record is appended as the final item in every dataset:
{
"type": "summary",
"categoriesScraped": ["crm", "email-marketing"],
"totalCompaniesFound": 87,
"totalDeduplicated": 6,
"companiesByCategory": {
"crm": 50,
"email-marketing": 43
},
"avgRating": null,
"scrapedAt": "2026-03-22T09:21:04.113Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
companyName | string | null | Full company name as listed on G2 |
website | string | null | Official website URL from the G2 listing |
domain | string | null | Registrable domain parsed from the website URL (e.g. hubspot.com). Useful for deduplication and enrichment |
g2ProfileUrl | string | null | Absolute URL to the company's G2 product listing page |
rating | number | null | Average G2 star rating, rounded to one decimal (1.0–5.0) |
reviewCount | integer | null | Total number of G2 reviews |
employeeCount | string | null | Employee size range as normalized by the actor (e.g. 51-1000, 1001+) |
headquarters | string | null | HQ location as listed on G2 (e.g. San Francisco, CA) |
pricingTier | string | null | Normalized pricing label: Free, Freemium, Contact Vendor, or raw text |
categories | string[] | G2 category tags on the product, always includes sourceCategory |
description | string | null | Short product description from the G2 listing card |
sourceCategory | string | The G2 category slug used to discover this company |
scrapedAt | string | ISO 8601 timestamp of when the record was extracted |
How much does it cost to scrape G2 companies?
G2 Company Scraper uses pay-per-event pricing — you pay $0.05 per company found that passes your quality filters. Platform compute costs are included. Companies that fail your minReviews, minRating, or employeeSizeFilter are never charged.
| Scenario | Companies | Cost per company | Total cost |
|---|---|---|---|
| Quick test | 10 | $0.05 | $0.50 |
| One category | 50 | $0.05 | $2.50 |
| Two categories | 100 | $0.05 | $5.00 |
| Five categories | 400 | $0.05 | $20.00 |
| Full market map | 1,000 | $0.05 | $50.00 |
You can set a maximum spending limit per run in the Apify console. The actor stops cleanly the moment your budget is reached, so you never exceed your target spend.
Compare this to purchasing a G2 Buyer Intent data export or a ZoomInfo list at $300-1,000+ per month. Most users of this actor spend $5-25 per research project with no subscription commitment and no minimum seat requirement.
Scraping G2 companies using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/g2-company-scraper").call(run_input={
"categories": ["crm", "email-marketing"],
"maxCompaniesPerCategory": 50,
"minReviews": 10,
"minRating": 3.5,
"deduplicateByDomain": True,
"proxyConfiguration": {
"useApifyProxy": True,
"apifyProxyGroups": ["RESIDENTIAL"]
}
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
if item.get("type") == "summary":
print(f"Summary: {item['totalCompaniesFound']} companies found")
else:
print(f"{item['companyName']} — {item['domain']} — {item['rating']} stars ({item['reviewCount']} reviews)")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/g2-company-scraper").call({
categories: ["crm", "email-marketing"],
maxCompaniesPerCategory: 50,
minReviews: 10,
minRating: 3.5,
deduplicateByDomain: true,
proxyConfiguration: {
useApifyProxy: true,
apifyProxyGroups: ["RESIDENTIAL"]
}
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
if (item.type === "summary") continue;
console.log(`${item.companyName} | ${item.domain} | ${item.rating} stars | ${item.employeeCount} employees | ${item.headquarters}`);
}
cURL
# Start the actor run
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~g2-company-scraper/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"categories": ["crm", "email-marketing"],
"maxCompaniesPerCategory": 50,
"minReviews": 10,
"minRating": 3.5,
"deduplicateByDomain": true,
"proxyConfiguration": {
"useApifyProxy": true,
"apifyProxyGroups": ["RESIDENTIAL"]
}
}'
# Fetch results (replace DATASET_ID from the run response's defaultDatasetId field)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How G2 Company Scraper works
Phase 1: URL construction and request queuing
For each category slug in the input (e.g. crm), the actor constructs the G2 category listing URL using https://www.g2.com/categories/{slug} for page 1 and https://www.g2.com/categories/{slug}?page=N for subsequent pages. All category page 1 requests are enqueued simultaneously at startup, so multiple categories are scraped in parallel up to the maxConcurrency: 2 limit.
Phase 2: HTML parsing with dual-selector fallback
Each category page is fetched by CheerioCrawler using a residential proxy session. The handler targets [data-testid="productListing"] containers — one per software product card. Within each card, every field attempts two or three selector patterns: the primary data-testid attribute selector, then a class-based fallback, then an itemprop attribute where available. This layered approach means the actor continues extracting data correctly through minor G2 layout changes. For example, the rating is extracted from [data-testid="productListingRating"] first, then [itemprop="ratingValue"], then .stars-container[data-rating].
Phase 3: Transformation and filtering
Raw strings from the HTML are passed through dedicated parsing functions: parseRating() extracts the first numeric match from strings like "4.5 out of 5", parseReviewCount() strips all non-digit characters from strings like "1,234 reviews", and cleanEmployeeCount() normalizes G2 size labels like "Mid-Market (51-1000 emp.)" into the plain range 51-1000. Pricing tiers are normalized by keyword matching: strings containing both "free" and "paid" become Freemium, strings containing "contact" become Contact Vendor. Domain extraction uses the WHATWG URL constructor to parse the website field, then strips the www. prefix for a clean registrable domain.
After transformation, each record is checked against minReviews, minRating, and employeeSizeFilter before being pushed to the dataset. Records that fail any filter are logged at debug level and skipped without a PPE charge.
Phase 4: Pagination and PPE charge management
If the per-category limit has not been reached and at least one product card was found on the current page, the next page URL is enqueued. This continues until either the limit is reached, no cards are found (end of category), or the spending limit fires. The PPE company-found charge event is fired after each successful Actor.pushData() call. If chargeResult.eventChargeLimitReached is returned, all category loops are flagged and the crawler exits cleanly after processing the current batch.
Tips for best results
- Start with a test run of 10 companies. Run a single category with
maxCompaniesPerCategory: 10to verify the output format and proxy performance before committing to a large batch. - Always use residential proxies. G2 detects and blocks datacenter IP ranges at the CDN layer. Apify Residential proxies are the default and the only reliably effective option for production runs.
- Use minReviews to control data quality. Setting
minReviews: 25focuses your dataset on established, actively-used products. Setting it to 0 includes everything — useful for tracking new entrants, but expect more incomplete records. - Combine categories in one run, not multiple runs. Cross-category deduplication only works within a single run. Running 5 categories as one job also reuses session warm-up, reducing proxy cost and total runtime.
- Pair with Website Contact Scraper for full lead records. G2 Company Scraper gives you the website domain; Website Contact Scraper turns each domain into email addresses and phone numbers. The domain field is ready to use as direct input.
- Schedule weekly runs for fast-moving categories. Categories like
generative-aiorai-writing-assistantgain new listings frequently. A weekly schedule with deduplication ensures your list stays current without re-processing known companies. - Filter by employee size for ICP precision. If your ideal customer profile is mid-market (51-1000 employees), set
employeeSizeFilter: ["51-1000"]to exclude SMB tools and enterprise-only platforms from the outset. - Export to CSV for direct CRM import. The Apify dataset export creates a flat CSV with all fields as columns, ready for import into HubSpot, Salesforce, or any spreadsheet-based workflow.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Website Contact Scraper | Feed the domain field from each G2 record into Website Contact Scraper to extract emails, phone numbers, and LinkedIn profiles for every company found |
| Website Contact Scraper Pro | Use the JS-rendering version for SaaS company websites that load contact details dynamically via React or Vue |
| Email Pattern Finder | Submit each domain to detect the company's email naming convention (e.g. [email protected]) before building outreach sequences |
| B2B Lead Qualifier | Score each G2 company 0-100 against 30+ ICP signals to prioritize the highest-value accounts before outreach |
| Waterfall Contact Enrichment | Run each domain through a 10-step enrichment cascade to find verified contact details across multiple data sources |
| Bulk Email Verifier | Verify emails found from downstream enrichment via MX and SMTP checks before importing into your sending tool |
| Website Tech Stack Detector | Detect what technologies each G2 company uses — useful for qualifying based on existing tech stack (e.g. "uses Salesforce + Marketo") |
| HubSpot Lead Pusher | Push the output dataset directly into HubSpot as contacts or companies without any intermediate steps |
Limitations
- Static HTML only. This actor uses CheerioCrawler, which parses server-rendered HTML. G2 category pages deliver initial content server-side, but dynamic features like filtered search, sort-by-score, or personalized rankings may not be reflected accurately.
- G2 markup changes break selectors. G2 periodically redesigns its category listing pages. The actor uses dual-selector fallback patterns but cannot guarantee coverage through major redesigns. Check the Issues tab if you start seeing empty results.
- Residential proxy required. G2 blocks datacenter IPs at the CDN level. Without Apify Residential proxies or equivalent, the majority of requests will return 403 errors or empty product card lists.
- No review content extraction. The actor extracts review counts and ratings but does not extract individual review text. For full review data, visit each company's G2 profile URL from the output.
- No advanced G2 filters. The actor scrapes category pages in default sort order. G2's UI-based filters (by industry, company size, deployment type) are not replicated — use the
employeeSizeFilterandminRatinginputs to approximate filtering in post-processing. - Pagination depth limits. G2 category pages typically show 25-30 products per page. Very large categories (e.g.
crm) may have dozens of pages. SettingmaxCompaniesPerCategoryprevents unbounded runs. - Employee count and HQ not always present. G2 only displays employee size and headquarters when the vendor has completed their profile. Expect 20-40% null rates on these fields for smaller or newer vendors.
- Rate limited at low concurrency. The crawler is intentionally limited to
maxConcurrency: 2to avoid triggering G2's rate-limiting heuristics. This means large multi-category runs take longer than maximum-concurrency crawlers. Do not increase concurrency without testing against block rates first.
Integrations
- Zapier — trigger a G2 scrape on a schedule and push new companies automatically to a HubSpot contact list or Google Sheet
- Make — build a multi-step scenario that scrapes G2, enriches contacts via an API, and adds qualified leads to a CRM sequence
- Google Sheets — stream G2 company results into a shared spreadsheet for team review and manual qualification
- Apify API — trigger runs programmatically from your data pipeline or internal tooling and retrieve results in JSON
- Webhooks — receive a POST notification when a run completes, then pull the dataset into your own application
- LangChain / LlamaIndex — use G2 company datasets as structured context for AI agents building market research summaries or competitive analysis reports
Troubleshooting
Empty results despite entering a valid category slug. The most common cause is missing or misconfigured proxies. G2 blocks datacenter IPs at the network edge, returning empty HTML bodies or 403 responses. Confirm your proxyConfiguration includes "apifyProxyGroups": ["RESIDENTIAL"]. If results are still empty, the G2 category slug may be incorrect — verify by visiting g2.com/categories/{slug} directly in a browser.
Partial results: fewer companies than expected. If you see fewer companies than the maxCompaniesPerCategory limit, either the category has fewer products than expected, your minReviews or minRating filters are excluding many records, or your spending limit was reached mid-run. Check the run log for "Spending limit reached" messages and the summary record in the dataset for per-category counts.
employeeCount and headquarters are null for many records. These fields are only populated when vendors have completed their G2 profile. Null rates of 20-40% are normal. For enriched company firmographics, pipe the domain field into Waterfall Contact Enrichment which pulls from multiple data sources.
Run is slower than expected. The actor runs at maxConcurrency: 2 deliberately to avoid block rates on G2. Multi-category runs with 100+ companies per category may take 15-30 minutes. If speed is critical, split categories across separate runs triggered in parallel via the Apify API.
G2 profile URLs appear as relative paths. The normalizeG2Url() function converts relative hrefs (e.g. /products/hubspot-crm) to absolute URLs (https://www.g2.com/products/hubspot-crm). If you see relative paths in output, it means the fallback selector returned a non-standard href — report the category in the Issues tab.
Responsible use
- This actor accesses publicly available software listing data on G2's category pages.
- Respect G2's terms of service. Do not use scraped data to reproduce G2's product database commercially or to compete with G2's own data products.
- Review count and rating data should be attributed to G2 when published in research or reports.
- Comply with GDPR, CAN-SPAM, and applicable data protection regulations when using company data for outreach campaigns.
- Do not use this actor to scrape personal data from G2 reviewer profiles.
- For guidance on the legality of web scraping, see Apify's guide to web scraping law.
FAQ
How do I find a G2 category slug to use as input?
Navigate to any G2 category page in your browser. The slug is the path segment after /categories/ in the URL. For example, g2.com/categories/email-marketing has the slug email-marketing. Slugs are always lowercase and hyphenated. Common examples: crm, project-management, marketing-automation, accounting, helpdesk, video-conferencing, e-commerce-platforms.
How many G2 companies can I scrape per category in one run?
Up to 1,000 per category, controlled by the maxCompaniesPerCategory input. G2 category pages typically show 25-30 products per page, so scraping 1,000 companies requires approximately 33-40 page requests per category. For most categories, the practical limit is 200-500 unique products before pagination returns empty pages.
Does G2 Company Scraper extract individual review text?
No. The actor extracts the aggregate review count and average star rating from each product's listing card. It does not visit individual product profile pages or extract the text of individual reviews. For full review content, use the g2ProfileUrl field in the output to visit each product's G2 page directly.
Is it legal to scrape G2 company listings?
Scraping publicly accessible web pages is generally permitted under US law (see the Ninth Circuit's decision in hiQ Labs v. LinkedIn). G2 category listings are publicly viewable without authentication. However, you must respect G2's terms of service, avoid copying their database for commercial redistribution, and comply with applicable data protection laws. See Apify's web scraping legality guide for detailed guidance.
Why does the actor require residential proxies?
G2 uses CDN-level IP reputation filtering that blocks known datacenter IP ranges including AWS, GCP, and Azure cloud egress IPs. Residential proxies route requests through real consumer IP addresses, which pass G2's block lists. Without residential proxies, most requests return 403 errors or empty HTML with no product cards. The default proxy configuration (useApifyProxy: true, apifyProxyGroups: ["RESIDENTIAL"]) handles this automatically.
How accurate is the rating and review count data?
The actor extracts ratings and review counts directly from the HTML served by G2 at the time of the run. The values match what you see when you visit the same category page in a browser. G2 updates ratings and review counts in near-real-time as new reviews are submitted, so data is accurate to within the G2 cache refresh interval (typically a few minutes to a few hours).
How is G2 Company Scraper different from G2's own Buyer Intent data?
G2 Buyer Intent is a paid product that identifies which companies are actively researching your category, using behavioral signals from G2 users. G2 Company Scraper extracts the public vendor listings in a category — a completely different dataset. This actor tells you which software companies exist in a category; G2 Buyer Intent tells you which buyers are looking at those companies. The two datasets are complementary.
Can I scrape multiple G2 categories at the same time?
Yes. Add multiple slugs to the categories array and the actor scrapes them concurrently (up to the maxConcurrency: 2 limit). Deduplication is applied across all categories in a single run, so if a product like HubSpot appears in both crm and marketing-automation, it is only included once in the output.
Can I schedule G2 Company Scraper to run automatically?
Yes. Use the Apify platform's built-in scheduler to run this actor daily, weekly, or on a custom cron schedule. Each scheduled run produces a fresh dataset. Pair scheduling with the Google Sheets or HubSpot integrations to keep your prospect lists automatically updated.
What happens when G2 changes its page markup?
The actor uses dual-selector fallback patterns: primary data-testid selectors plus class-based and itemprop fallbacks for every field. Minor markup changes are typically absorbed by the fallback layer. Major redesigns that remove data-testid attributes entirely will cause empty results — open an issue in the Issues tab and include the category slug and a link to the affected page so the selectors can be updated.
How does deduplication work when I scrape multiple categories?
When deduplicateByDomain is enabled, the actor maintains an in-memory set of all domains seen during the run. The first time a domain is encountered (in any category), the company is saved and charged. Any subsequent card with the same domain — whether in the same category on a later page or in a different category — is skipped without pushing data or charging a PPE event.
Can I use the output with other Apify actors for contact enrichment?
Yes, and this is the recommended workflow. The domain field in every output record is clean (e.g. hubspot.com) and ready to use directly as input to Website Contact Scraper, Email Pattern Finder, or Waterfall Contact Enrichment. Export the dataset as JSON, extract the domain array, and pass it to the next actor in your pipeline.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom G2 data extractions, category monitoring pipelines, or enterprise integrations, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Weather Forecast Search
Get weather forecasts for any location worldwide using the free Open-Meteo API. Returns current conditions, daily and hourly forecasts with temperature, precipitation, wind, UV index, and more. No API key needed.
EUIPO EU Trademark Search
Search EU trademarks via official EUIPO database. Find registered and pending trademarks by name, Nice class, applicant, or status. Returns full trademark details and filing history.
Nominatim Address Geocoder
Geocode addresses to GPS coordinates and reverse geocode coordinates to addresses using OpenStreetMap Nominatim. Batch geocoding with rate limiting. Free, no API key needed.
Ready to try G2 Company Scraper?
Start for free on Apify. No credit card required.
Open on Apify Store