Every URL filter claims millions of categorized domains. Few admins ever check whether the categories are right. Then a sales rep cannot reach a prospect’s site because it is filed under “gambling,” and the audit happens in a Slack war room instead of a planned review.
This post gives you a repeatable audit protocol so category accuracy becomes a measurable number at renewal time, not a surprise in the middle of the quarter.
The Short Answer
Category accuracy decays because the web changes faster than vendor taxonomies. A domain bought last year for SaaS onboarding might be parked today. A staging subdomain inherits a generic category that no longer fits. Vendors classify at scale, which means they miss the long tail that matters to your specific business.
An audit uses a test corpus of URLs drawn from your own logs, classifies each one by hand, and compares the truth to what the vendor reports. The gap is your accuracy score. Run this quarterly and the conversation with the vendor changes.
Why Category Accuracy Quietly Decays
Three forces pull a vendor’s categorization out of alignment with reality.
The first is drift. Domains change owners. SaaS products pivot. A URL that was “software downloads” in 2023 is “generative AI” in 2026, and the vendor’s crawler has not revisited it. Multiply by a few hundred domains your team actually uses and the stale entries add up.
The second is taxonomy fit. Vendor category trees are built for a general audience. Your company might care deeply about a distinction between “sanctioned SaaS” and “unsanctioned SaaS” that the vendor does not model. Everything ends up in “business” and the filter cannot enforce your real policy.
The third is the recategorization queue. When a user reports a miscategorized site, the request goes into a vendor backlog. Some vendors close it in hours. Others leave it for weeks while the same domain generates dozens of tickets. Audit cadence is what turns this from a black box into a measurable service level.
Phase 1: Building a Test Corpus of 200 URLs
A useful corpus is opinionated. It reflects the URLs your users actually hit and the categories your policy actually cares about. Random samples from crawler lists do not measure anything useful.
Pull the last 30 days of proxy logs. Sort by volume and by complaint frequency. Select URLs in the following mix:
- 40 high-volume business destinations across SaaS, CRM, and finance
- 40 marketing and research destinations sales and SE teams rely on
- 30 recently registered domains your team has encountered
- 30 generative AI services, sanctioned and unsanctioned
- 30 dev and infrastructure domains like staging subdomains and package registries
- 30 social, news, and media sites relevant to acceptable-use policy
Classify each URL yourself. Two reviewers, one tiebreaker. Write the expected category on every row. This is the ground truth your vendor will be measured against. A careful secure web gateway vendor will welcome the exercise because their numbers will hold up.
Sample Test Corpus Row
| URL | Reviewer 1 | Reviewer 2 | Ground Truth | Vendor Category | Result |
|---|---|---|---|---|---|
| onboarding.examplecrm.com | SaaS sanctioned | SaaS sanctioned | SaaS sanctioned | Business | Partial |
| beta.aicoder.dev | Generative AI | Dev tools | Generative AI | Uncategorized | Incorrect |
| mortgagerates.io | Finance | Finance | Finance | Gambling | Incorrect |
Phase 2: Scoring, Comparing, Reporting
Once the corpus is classified, hit each URL through the vendor’s lookup tool or pull the category from the live proxy log. Record the result against your ground truth and score it with a rubric.
Scoring Rubric
| Result | Definition | Points |
|---|---|---|
| Correct | Vendor category matches ground truth exactly | 1.0 |
| Partial | Vendor category is in the right family but wrong subcategory | 0.5 |
| Incorrect | Vendor category is wrong in a way that breaks policy | 0.0 |
| Uncategorized | Vendor returns no category at all | 0.0 |
Total the score, divide by 200, and you have a single accuracy percentage. A serious vendor lands above 85 percent on a well-built corpus. Below 75 percent is renewal-blocking in most enterprise environments.
Repeat the exercise quarterly with the same corpus plus 30 new URLs from recent logs. The trend matters more than the snapshot.
Accuracy without a corpus is vibes. Accuracy with a corpus is a number you can put in a renewal conversation.
Phase 3: Turning the Audit Into Leverage
The audit score only matters if it feeds a process. Three outputs make the audit worth the hours it takes.
A Delta Report for Your Vendor
Send the miscategorized URLs back to the vendor with your ground-truth labels. Track the turnaround time for each recategorization. A vendor that closes requests in under 48 hours is investing in accuracy. A vendor that averages two weeks is telling you something about their roadmap.
A Policy Override List for Your Users
For high-volume incorrect categorizations, publish a local override list so users are not blocked while the vendor catches up. An endpoint-first swg applies overrides immediately without waiting for a cloud sync.
A Renewal Negotiation Artifact
Bring the score, the delta report, and the turnaround metrics to the renewal meeting. Vendors negotiate on measurable service quality. Teams without measurements negotiate on feelings and lose.
FAQ
How to configure URL filtering?
URL filtering is configured by selecting which categories to allow, warn, or block, then layering per-group and per-user exceptions on top. A modern deployment also defines policies for uncategorized URLs and for new TLDs. Products like dope.security push the full policy to the endpoint so configuration changes apply instantly, with no wait for a cloud sync.
What does URL filtering do?
URL filtering evaluates every outbound web request against a category database and a policy. The filter blocks, warns, or allows based on the match. Categories typically include malware, phishing, generative AI, gambling, and a long list of business topics. Good filters also support custom categories for organization-specific policy.
How accurate are URL filtering vendors?
General-purpose category accuracy usually lands between 75 and 90 percent when measured against a real-world corpus. Vendors that publish higher numbers are usually measuring against their own training set, not live user traffic. Auditing with your own 200-URL corpus is the only way to get a number that reflects your environment.
What counts as a miscategorization worth reporting?
Any URL where the vendor category produces the wrong policy outcome for your organization. A SaaS domain filed under generic “business” is fine unless your policy distinguishes sanctioned from unsanctioned. Report anything that either blocks legitimate work or allows a category your policy is supposed to control.