Compliance Scanner
Assess web scraping compliance risk for any target URL. Analyses Terms of Service, robots.txt, PII exposure, and applicable regulations (GDPR, CCPA, CFAA).
Maintenance Pulse
90/100Cost Estimate
How many results do you need?
Pricing
Pay Per Event model. You only pay for what you use.
| Event | Description | Price |
|---|---|---|
| compliance-scan | Charged per URL scanned. | $0.25 |
Example: 100 events = $25.00 · 1,000 events = $250.00
Documentation
Compliance Scanner gives you an instant compliance risk assessment for any Apify actor before you run it, publish it, or integrate it into a production pipeline. Point it at any actor by ID or username/actor-name, and it returns a structured risk report covering PII exposure, Terms of Service violations, authentication wall access, and applicable regulations — in seconds, for $0.15 per scan.
This actor reads publicly available metadata from the Apify API — it never runs the target actor or touches any scraped data. It analyses the actor's name, title, description, and categories against 18 PII indicators, 13 platform-specific ToS rules, 7 authentication wall patterns, and 6 major regulatory frameworks (GDPR, CCPA/CPRA, CFAA, ePrivacy Directive, CAN-SPAM, PIPEDA). Every finding is backed by a plain-English explanation and a specific, actionable recommendation.
What data does a compliance scan produce?
| Data Point | Source | Example |
|---|---|---|
| 📊 Overall risk level | Composite of all factors | HIGH |
| 🔒 PII risk level | 18-keyword metadata analysis | HIGH |
| 🔑 PII keywords detected | Keyword match list | ["email", "phone", "name", "contact"] |
| 🚧 ToS risk level | 13-platform lookup table | MEDIUM |
| 🏢 Platform ToS details | Per-platform risk + reason | LinkedIn: HIGH — actively litigates |
| 🔐 Authentication wall risk | 7-keyword auth pattern scan | LOW |
| ⚖️ Applicable regulations | Jurisdiction mapping | GDPR, CAN-SPAM, PIPEDA |
| 📋 Regulation details | Keyword-to-jurisdiction match | "Detected keywords: email, contact" |
| 📁 Category risk profile | Apify category risk table | LEAD_GENERATION: PII=HIGH, ToS=MEDIUM |
| 💡 Recommendations | Rule-based action items | "Add opt-out mechanism for email collection" |
| 🕐 Scan timestamp | ISO 8601 | 2026-03-20T14:22:00.000Z |
Why use Compliance Scanner for Apify actors?
Most developers think about compliance after the problem has already occurred — after data has been collected without a lawful basis, after a DMCA notice lands in the inbox, or after an enterprise client asks why their pipeline touches LinkedIn and Facebook data. Retroactively adding consent mechanisms, data retention policies, and opt-out workflows is expensive and stressful.
Compliance Scanner shifts that conversation to the start of the development cycle. A 15-second, $0.15 scan before you build, before you publish, and before you invoice a client is orders of magnitude cheaper than a legal review after the fact.
- Scheduling — run weekly scans against your entire actor portfolio to catch newly published actors with unchecked risk profiles
- API access — trigger scans programmatically from Python, JavaScript, or any HTTP client as part of your CI/CD pipeline
- Structured output — every report is machine-readable JSON, ready to feed into dashboards, Slack alerts, or compliance tracking spreadsheets
- Monitoring — configure Apify alerts to notify you when a scan returns a HIGH overall risk rating
- Integrations — pipe results into Google Sheets, HubSpot, Zapier, or Make for compliance workflow automation
Features
- 18-indicator PII keyword scan — checks actor name, title, description, and category fields for email, phone, contact, name, address, personal, profile, employee, team member, person, lead, enrichment, people, identity, salary, resume, CV, and identity signals
- 3-tier risk scoring — LOW / MEDIUM / HIGH risk levels across four independent dimensions, combined into a single overall risk rating using a worst-case aggregation function
- 13-platform ToS lookup table — LinkedIn and Facebook are flagged HIGH (active litigation history), Amazon, Google, YouTube, TikTok, Indeed, and Glassdoor are flagged MEDIUM, Zillow, Yelp, and Reddit are flagged LOW, each with a human-readable reason
- Authentication wall detection — 7-keyword auth pattern analysis (login, auth, credential, session, cookie, signed-in, behind login) to flag potential CFAA exposure for US-based operations
- 6-regulation jurisdiction mapper — automatically identifies which of GDPR, CCPA/CPRA, CFAA, ePrivacy Directive, CAN-SPAM, and PIPEDA apply based on detected data types and keywords, with the specific triggering keywords cited in each finding
- Apify category risk profiles — LEAD_GENERATION and SOCIAL_MEDIA actors carry HIGH PII risk and MEDIUM-HIGH ToS risk; ECOMMERCE and TRAVEL actors carry LOW PII risk; each category profile includes a notes field explaining the risk rationale
- Actionable, non-generic recommendations — instead of "review applicable law," the actor outputs specific steps: "Add opt-out mechanism for email collection," "Ensure you have authorization to access authenticated content," "Document your lawful basis for processing personal data under GDPR"
- Metadata-only analysis — reads the Apify public actor API endpoint only; the target actor is never invoked and no scraped data is accessed
- Fast execution — typical scan completes in under 10 seconds; 128 MB memory allocation is sufficient for all scans
Use cases for Apify actor compliance scanning
Pre-publish compliance review
You are building a lead generation actor that scrapes email addresses from company websites. Before listing it on the Apify Store, run Compliance Scanner to understand your GDPR obligations, identify the specific regulations that apply, and confirm the exact steps you need to take — lawful basis documentation, data retention policy, opt-out mechanism — before your first user runs it.
Portfolio-wide compliance audit
You manage a catalogue of 40+ actors across lead generation, social media, and e-commerce categories. Running Compliance Scanner against every actor in your portfolio takes minutes and produces a structured JSON report for each one. Feed the results into a spreadsheet to identify which actors carry HIGH overall risk and need immediate attention versus which ones are LOW risk and require no action.
Third-party actor due diligence
Your enterprise client wants to integrate a third-party Apify actor into their data pipeline. Before approving it for use, run Compliance Scanner to assess whether the actor involves PII, which platforms it targets, and which regulations may apply. The structured output is ready to include directly in a procurement review or vendor risk assessment.
CI/CD compliance gate
Integrate Compliance Scanner into your actor deployment workflow. Before promoting an actor from staging to production, trigger a scan via the Apify API and parse the overallRisk field. Automatically block deployments where overallRisk === "HIGH" until the team has reviewed and documented mitigating controls.
Client compliance reporting
Agencies building custom scraping pipelines for enterprise clients can use Compliance Scanner to generate a compliance report for every actor in the client's stack. The structured output — risk levels, applicable regulations, specific recommendations — maps cleanly onto common compliance frameworks and saves hours of manual documentation.
Regulatory change triage
When a new data protection law comes into effect (for example, a new US state privacy law), re-scan your existing actor portfolio to identify which actors now fall under the new jurisdiction. The regulation mapping in Compliance Scanner's output identifies the triggering keywords for each applicable regulation, making it straightforward to prioritise which actors need updated documentation.
How to run a compliance scan on an Apify actor
- Enter the target actor ID — Type the actor's ID in the format
username/actor-name(for example,ryanclinton/website-contact-scraper) or use the full actor ID string. You can find this in the actor's URL on the Apify Store. - Click Start — No other configuration is required. The scan runs against the Apify public API and completes in under 15 seconds.
- Review the risk report — The Dataset tab shows your structured compliance report with overall risk, per-dimension risk levels, applicable regulations, and specific recommendations.
- Download results — Export as JSON or CSV from the Dataset tab. Use the JSON output to feed dashboards, compliance tracking tools, or automated workflows.
Input parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
targetActorId | string | Yes | — | Actor ID or username/actor-name to scan. Example: ryanclinton/website-contact-scraper or the full Apify actor UUID. |
Input examples
Scan a single actor by username/name:
{
"targetActorId": "ryanclinton/website-contact-scraper"
}
Scan a high-risk social media actor:
{
"targetActorId": "apify/instagram-scraper"
}
Scan using a full actor UUID:
{
"targetActorId": "moJRLRc85AitArpNN"
}
Input tips
- Use
username/actor-nameformat — this is the most reliable identifier. The actor slug appears in every Apify Store URL afterapify.com/. - Batch scans via the API — if you are auditing a portfolio of 20+ actors, trigger runs in sequence via the Apify API rather than through the UI; each run takes under 15 seconds.
- Actor UUIDs also work — if you have a UUID from the Apify API or your platform dashboard, paste it directly into
targetActorId. - Save results before re-running — each run overwrites the previous dataset by default; use the Apify API to fetch and archive results between runs if you are building a compliance history.
Output example
{
"actorName": "ryanclinton/website-contact-scraper",
"actorId": "ryanclinton/website-contact-scraper",
"piiRisk": "HIGH",
"piiKeywords": ["email", "phone", "contact", "name", "address", "lead"],
"authRisk": "LOW",
"authKeywords": [],
"tosRisk": "LOW",
"tosDetails": [],
"categoryRisks": [
{
"piiRisk": "HIGH",
"tosRisk": "MEDIUM",
"notes": "Typically collects PII (emails, phones, names). GDPR requires lawful basis."
}
],
"applicableRegulations": [
{
"name": "GDPR",
"jurisdiction": "EU/EEA",
"reason": "Detected keywords: email, phone, name, contact"
},
{
"name": "CCPA/CPRA",
"jurisdiction": "California, USA",
"reason": "Detected keywords: email, phone, name, contact"
},
{
"name": "ePrivacy Directive",
"jurisdiction": "EU",
"reason": "Detected keywords: email, contact"
},
{
"name": "CAN-SPAM",
"jurisdiction": "United States",
"reason": "Detected keywords: email, contact"
},
{
"name": "PIPEDA",
"jurisdiction": "Canada",
"reason": "Detected keywords: email, phone, name"
}
],
"overallRisk": "HIGH",
"recommendations": [
"Document your lawful basis for processing personal data under GDPR",
"Add opt-out mechanism for email collection",
"Implement data retention policy and document it",
"Consult legal counsel regarding applicable regulations"
],
"scannedAt": "2026-03-20T14:22:31.007Z"
}
Output fields
| Field | Type | Description |
|---|---|---|
actorName | string | Resolved actor name in username/name format |
actorId | string | The input actor ID exactly as provided |
piiRisk | string | PII exposure risk: LOW, MEDIUM, or HIGH |
piiKeywords | string[] | List of PII indicator keywords matched in metadata |
authRisk | string | Authentication wall risk: LOW, MEDIUM, or HIGH |
authKeywords | string[] | List of auth-pattern keywords matched in metadata |
tosRisk | string | Platform Terms of Service risk: LOW, MEDIUM, or HIGH |
tosDetails | object[] | Per-platform ToS findings with platform, level, and reason |
tosDetails[].platform | string | Platform name (e.g., linkedin, amazon) |
tosDetails[].level | string | Platform-specific risk level |
tosDetails[].reason | string | Plain-English explanation of the ToS concern |
categoryRisks | object[] | Risk profile for each Apify category the actor belongs to |
categoryRisks[].piiRisk | string | PII risk for this category |
categoryRisks[].tosRisk | string | ToS risk for this category |
categoryRisks[].notes | string | Human-readable explanation of category risk |
applicableRegulations | object[] | Regulations triggered by detected keywords |
applicableRegulations[].name | string | Regulation name (e.g., GDPR, CCPA/CPRA) |
applicableRegulations[].jurisdiction | string | Geographic jurisdiction (e.g., EU/EEA, California, USA) |
applicableRegulations[].reason | string | Specific keywords that triggered this regulation |
overallRisk | string | Worst-case aggregate across PII, ToS, and auth risk dimensions |
recommendations | string[] | Specific, actionable compliance steps based on detected risks |
scannedAt | string | ISO 8601 timestamp of when the scan was performed |
How much does it cost to run a compliance scan?
Compliance Scanner uses pay-per-event pricing — you pay $0.15 per scan. Platform compute costs are included. Each scan reads one API endpoint and completes in under 15 seconds; there are no per-minute charges.
| Scenario | Scans | Cost per scan | Total cost |
|---|---|---|---|
| Quick test | 1 | $0.15 | $0.15 |
| Small audit | 10 | $0.15 | $1.50 |
| Medium portfolio | 50 | $0.15 | $7.50 |
| Large portfolio | 200 | $0.15 | $30.00 |
| Enterprise fleet | 1,000 | $0.15 | $150.00 |
You can set a maximum spending limit per run to control costs. The actor stops when your budget is reached.
Compare this to dedicated legal review, which typically costs $300–$500/hour for specialist counsel. A full portfolio scan of 50 actors costs $7.50 with Compliance Scanner — a starting point that identifies which actors actually need legal attention before you spend a dollar on advice.
Compliance scanning using the API
Python
from apify_client import ApifyClient
client = ApifyClient("YOUR_API_TOKEN")
run = client.actor("ryanclinton/actor-compliance-scanner").call(run_input={
"targetActorId": "ryanclinton/website-contact-scraper"
})
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"Actor: {item['actorName']}")
print(f"Overall risk: {item['overallRisk']}")
print(f"PII risk: {item['piiRisk']} — keywords: {item['piiKeywords']}")
print(f"ToS risk: {item['tosRisk']}")
print(f"Applicable regulations: {[r['name'] for r in item['applicableRegulations']]}")
print("Recommendations:")
for rec in item["recommendations"]:
print(f" - {rec}")
JavaScript
import { ApifyClient } from "apify-client";
const client = new ApifyClient({ token: "YOUR_API_TOKEN" });
const run = await client.actor("ryanclinton/actor-compliance-scanner").call({
targetActorId: "ryanclinton/website-contact-scraper"
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
for (const item of items) {
console.log(`Actor: ${item.actorName}`);
console.log(`Overall risk: ${item.overallRisk}`);
console.log(`PII risk: ${item.piiRisk} — keywords: ${item.piiKeywords.join(", ")}`);
console.log(`Applicable regulations: ${item.applicableRegulations.map(r => r.name).join(", ")}`);
console.log("Recommendations:", item.recommendations);
}
cURL
# Start the compliance scan
curl -X POST "https://api.apify.com/v2/acts/ryanclinton~actor-compliance-scanner/runs?token=YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"targetActorId": "ryanclinton/website-contact-scraper"}'
# Fetch results (replace DATASET_ID from the run response above)
curl "https://api.apify.com/v2/datasets/DATASET_ID/items?token=YOUR_API_TOKEN&format=json"
How Compliance Scanner works
Phase 1 — Actor metadata retrieval
The actor calls the Apify REST API endpoint GET /v2/acts/{actorId} with your platform token. Actor IDs in username/actorname format are automatically converted to the username~actorname URL-safe encoding. The response includes the actor's name, title, description, categories, and owner metadata. The entire analysis runs against this metadata — no additional HTTP requests are made, the target actor is never invoked, and no scraped data is accessed.
Phase 2 — Keyword and pattern matching
A composite search string is assembled from the actor name, title, description, and all category labels, converted to lowercase. This string is then evaluated against three independent pattern sets:
-
PII indicators — 18 keywords including
email,phone,contact,name,address,personal,profile,employee,team member,person,lead,enrichment,people,identity,salary,resume, andcv. Matching 3 or more keywords produces a HIGH PII risk; 1–2 keywords produce MEDIUM; zero matches produce LOW. -
Authentication wall patterns — 7 keywords including
login,auth,credential,session,cookie,signed-in, andbehind login. Matching 2 or more produces HIGH; 1 match produces MEDIUM. This dimension is relevant to CFAA exposure in US jurisdictions. -
Platform ToS lookup — The search string is tested for 13 platform names. Each platform has a pre-assigned risk level and reason derived from known enforcement history: LinkedIn and Facebook are HIGH (active litigation), Amazon, Google, YouTube, TikTok, Indeed, and Glassdoor are MEDIUM, Zillow, Yelp, and Reddit are LOW. All matching platforms are returned in the
tosDetailsarray.
Phase 3 — Regulation mapping and category profiling
The combined list of matched PII and auth keywords is tested against 6 regulation keyword sets. GDPR and CCPA/CPRA trigger on personal data indicators (email, phone, name, profile). CFAA triggers on authentication keywords. The ePrivacy Directive and CAN-SPAM trigger on email and marketing signals. PIPEDA triggers on Canadian-relevant personal data. Each applicable regulation cites the specific keywords that triggered it.
The actor's Apify categories are also looked up in a category risk table covering LEAD_GENERATION, SOCIAL_MEDIA, ECOMMERCE, JOBS, and TRAVEL, each with independent PII and ToS risk scores and a human-readable notes field.
Phase 4 — Risk aggregation and recommendations
The overall risk level is computed as the worst-case value across all three dimensions (PII, ToS, auth) using a simple precedence function: HIGH > MEDIUM > LOW. Recommendations are generated by a rule set that fires based on the specific risk profile: non-LOW PII risk triggers documentation and retention recommendations; email keyword matches trigger opt-out recommendations; non-LOW auth risk triggers authorization reminders; HIGH ToS risk triggers a litigation-awareness warning.
Tips for best results
-
Scan before you build, not after. The lowest-cost time to discover a compliance issue is during design, not after you have collected 100,000 records. Run a scan on a comparable actor in the same category before you write your first line of code.
-
Use the
overallRiskfield as a triage gate. In automated workflows, parseoverallRiskfirst. LOW-risk actors typically require no action. MEDIUM-risk actors need documentation. HIGH-risk actors need legal review before deployment. -
Pay attention to the
applicableRegulationsarray. Each entry includes the specific triggering keywords. Use this to brief legal counsel precisely — "this actor matches email, phone, and contact keywords, which triggers GDPR, CCPA, and PIPEDA" — rather than asking for a general review. -
Cross-reference platform ToS details with the
tosDetailsreason field. Each platform entry explains why the risk exists. LinkedIn's HIGH rating reflects its active litigation history against scrapers. Reddit's LOW rating reflects that public posts are lower-risk despite API restrictions. -
Batch your portfolio scans weekly. Actor metadata — especially descriptions — changes over time. An actor that was LOW-risk when published may develop new PII features in a later version. Weekly scheduled scans catch these changes.
-
Combine with Actor Quality Audit for a full readiness assessment. Quality Audit covers Store listing quality (README, input schema, test runs); Compliance Scanner covers legal and regulatory exposure. Run both before publishing.
-
Store scan results with timestamps. The
scannedAtfield in every output record gives you an audit trail. Archive results to show that compliance was assessed at the time of publication — this is relevant documentation if a regulatory question arises later.
Combine with other Apify actors
| Actor | How to combine |
|---|---|
| Actor Quality Audit | Run Quality Audit for Store listing readiness and Compliance Scanner for legal risk — together they cover the full pre-publish checklist |
| Actor Schema Validator | Compliance Scanner flags which data types are being collected; Schema Validator confirms the output schema correctly reflects those types |
| Website Contact Scraper | Scan this actor with Compliance Scanner before deploying to confirm GDPR, CCPA, and CAN-SPAM obligations are understood |
| Google Maps Email Extractor | Scan before using for client campaigns to produce a due-diligence report on PII handling obligations |
| Waterfall Contact Enrichment | Multi-step enrichment pipelines have layered PII obligations; scan each stage actor individually to map the full regulatory surface |
| B2B Lead Qualifier | Lead scoring actors collecting personal attributes trigger multiple regulations; scan first to identify all applicable jurisdictions |
| Bulk Email Verifier | Verification of email addresses constitutes processing of personal data; scan the verifier before enterprise deployment |
Limitations
- Keyword-based analysis only. The scan analyses actor metadata — name, title, description, categories — not the actual source code. An actor that collects personal data without mentioning it in the description will receive a lower risk score than its actual risk warrants.
- Does not analyse actor source code. Compliance Scanner cannot inspect the JavaScript or TypeScript code that runs inside the target actor. Source code analysis requires access to the actor's repository, which is not available via the public API.
- Platform ToS coverage is limited to 13 platforms. Platforms not in the lookup table (Airbnb, Booking.com, Fiverr, etc.) will not be flagged in the ToS dimension even if they prohibit scraping. Check the platform's own ToS for unlisted sites.
- Regulation list is static. The 6 regulations in the mapper reflect the state of the law when the actor was built. New state privacy laws (Texas, Florida, Virginia, etc.) are not yet covered. The actor will be updated as the regulation list expands.
- Not legal advice. The compliance report identifies potential exposure based on keyword patterns. It does not constitute legal advice and should not be used as a substitute for qualified legal review on material decisions.
- Dependent on Apify API availability. If the Apify API is unavailable or rate-limited, the scan will fail with a descriptive error. The actor does not retry automatically.
- Metadata accuracy depends on the actor author. If the target actor's description does not accurately describe what it collects, the scan will reflect the description, not the reality. Actors with sparse or misleading descriptions may produce incomplete risk assessments.
- No historical tracking. Each run produces a point-in-time report. Compliance Scanner does not maintain a history of prior scans for a given actor; archive results externally if audit trails are required.
Integrations
- Zapier — trigger a compliance scan automatically when a new actor is added to your monitored portfolio and route HIGH-risk findings to a Slack channel or email alert
- Make — build a weekly compliance audit workflow that scans every actor in your catalogue and logs results to a Google Sheet
- Google Sheets — export scan results directly to a compliance tracking spreadsheet with risk levels, applicable regulations, and timestamps in separate columns
- Apify API — trigger scans programmatically as a CI/CD gate before promoting actors from staging to production
- Webhooks — receive a webhook notification when a scan completes with a HIGH overall risk rating for immediate triage
- LangChain / LlamaIndex — feed compliance scan output into an LLM-powered legal analysis workflow that drafts privacy policy clauses based on detected regulations
Troubleshooting
-
"Actor not found (404)" error in output — The actor ID you provided does not resolve to a published actor. Check that the format is
username/actor-name(notusername/actor_namewith underscores) and that the actor is publicly visible on the Apify Store. If the actor is private or draft-only, it will not be accessible via the public API with a standard token. -
All risk levels showing LOW for an actor that seems high-risk — The scan analyses metadata only. If the target actor has a sparse description that does not mention what it collects (e.g., a description that says "scrapes business data" without mentioning email or contact), the keyword matcher will not detect PII indicators. In this case, read the actor's README and input schema directly to assess risk manually.
-
Run completes but Dataset tab is empty — This can happen if the actor run ended with an error before pushing data. Check the actor's Log tab for the error message. The most common cause is a missing or malformed
targetActorIdinput. -
Scan does not flag a platform I know restricts scraping — Only 13 platforms are in the current lookup table. If the platform you are concerned about is not in the list (Airbnb, Booking.com, Etsy, etc.), the
tosRiskdimension will not reflect it. Check the platform's ToS directly for unlisted sites.
Responsible use
- This actor only accesses publicly available actor metadata via the Apify API.
- Compliance Scanner does not run the target actor, access scraped data, or store any third-party data.
- Scan results are informational only and do not constitute legal advice.
- Comply with GDPR, CCPA, and all applicable data protection regulations when acting on findings from this tool.
- For guidance on web scraping legality, see Apify's guide.
FAQ
How do I scan an Apify actor for compliance risk?
Enter the actor's username/actor-name identifier (found in every Apify Store URL) as the targetActorId input and click Start. The scan completes in under 15 seconds and returns a structured report with risk levels for PII, Terms of Service, and authentication wall access, plus a list of applicable regulations and specific recommendations.
What does the Compliance Scanner actually analyse? The scanner reads the actor's publicly available metadata from the Apify API: name, title, description, and categories. It evaluates this text against 18 PII indicator keywords, 13 platform ToS lookup entries, 7 authentication pattern keywords, and 6 regulatory jurisdiction mappers. It does not access the actor's source code, run the actor, or touch any data the actor has previously collected.
Is it legal to use a web scraping actor? It depends on what is being scraped and where. Collecting publicly available business information generally carries lower legal risk than collecting personal data (names, emails, phone numbers), accessing content behind login walls, or scraping platforms that actively prohibit automated access in their Terms of Service. Compliance Scanner helps you identify which of these risk categories apply before you run an actor — see Apify's guide on web scraping legality for a broader treatment.
How accurate is the compliance risk assessment? The accuracy of the scan is bounded by the quality of the target actor's metadata. Actors with detailed, accurate descriptions will produce more complete risk assessments. Actors with sparse or vague descriptions may understate their actual risk. The scan is a structured starting point for compliance review, not a definitive legal determination.
How is Compliance Scanner different from manual legal review? Manual legal review by qualified counsel costs $300–$500/hour and requires detailed briefings. Compliance Scanner costs $0.15 per scan, runs in 15 seconds, and produces a machine-readable structured report that identifies specific applicable regulations and triggering keywords. Use Compliance Scanner to triage your portfolio and identify which actors need actual legal review — rather than paying for a review on every actor.
Does Compliance Scanner replace legal advice? No. The scan identifies potential regulatory exposure based on keyword pattern analysis of publicly available metadata. It is a compliance awareness tool, not a legal opinion. For material decisions — publishing a commercial actor, processing personal data at scale, deploying in regulated industries — consult qualified legal counsel.
Can I scan a private or draft actor? No. Compliance Scanner reads actor metadata via the Apify public API using your platform token. If the target actor is set to private or is in draft status, the API will return a 404 and the scan will output an error. The actor must be published and publicly accessible.
How many actors can I scan in one run? Each run scans exactly one actor. To scan a portfolio of multiple actors, trigger multiple sequential runs via the Apify API or set up a scheduled task. Each scan takes under 15 seconds and costs $0.15.
Can I schedule Compliance Scanner to run weekly? Yes. Use the Apify Scheduler to run the actor on a weekly or monthly schedule against each actor in your portfolio. Combined with the Apify API, you can loop through a list of actor IDs and trigger individual scans for each, then aggregate the results into a compliance tracking spreadsheet.
What regulations does Compliance Scanner cover? The current version maps findings to six regulatory frameworks: GDPR (EU/EEA), CCPA/CPRA (California, USA), CFAA (United States — relevant for accessing content behind authentication), ePrivacy Directive (EU), CAN-SPAM (United States), and PIPEDA (Canada). Regulation coverage will expand as new laws enter force.
Which platforms are in the Terms of Service lookup table? The current lookup covers 13 platforms: LinkedIn and Facebook (HIGH risk), Amazon, Google, YouTube, TikTok, Twitter/X, Indeed, and Glassdoor (MEDIUM risk), and Zillow, Yelp, and Reddit (LOW risk). Each entry includes a plain-English explanation of why that platform carries that risk level.
What happens if the Apify API is unavailable during a scan? The actor uses a 30-second timeout on the metadata fetch. If the API call fails or times out, the run will push an error object to the dataset with a descriptive message and exit cleanly. No charge event is fired if the metadata fetch fails.
Help us improve
If you encounter issues, you can help us debug faster by enabling run sharing in your Apify account:
- Go to Account Settings > Privacy
- Enable Share runs with public Actor creators
This lets us see your run details when something goes wrong, so we can fix issues faster. Your data is only visible to the actor developer, not publicly.
Support
Found a bug or have a feature request? Open an issue in the Issues tab on this actor's page. For custom compliance workflow integrations or enterprise portfolio scanning, reach out through the Apify platform.
How it works
Configure
Set your parameters in the Apify Console or pass them via API.
Run
Click Start, trigger via API, webhook, or set up a schedule.
Get results
Download as JSON, CSV, or Excel. Integrate with 1,000+ apps.
Use cases
Sales Teams
Build targeted lead lists with verified contact data.
Marketing
Research competitors and identify outreach opportunities.
Data Teams
Automate data collection pipelines with scheduled runs.
Developers
Integrate via REST API or use as an MCP tool in AI workflows.
Related actors
Bulk Email Verifier
Verify email deliverability at scale. MX record validation, SMTP mailbox checks, disposable and role-based detection, catch-all flagging, and confidence scoring. No external API costs.
GitHub Repository Search
Search GitHub repositories by keyword, language, topic, stars, forks. Sort by stars, forks, or recently updated. Returns metadata, topics, license, owner info, URLs. Free API, optional token for higher limits.
Website Content to Markdown
Convert any website to clean Markdown for RAG pipelines, LLM training, and AI apps. Crawls pages, strips boilerplate, preserves headings, tables, and code blocks. GFM support.
Website Tech Stack Detector
Detect 100+ web technologies on any website. Identifies CMS, frameworks, analytics, marketing tools, chat widgets, CDNs, payment systems, hosting, and more. Batch-analyze multiple sites with version detection and confidence scoring.
Ready to try Compliance Scanner?
Start for free on Apify. No credit card required.
Open on Apify Store