How we classify IP addresses in the IP to Privacy Detection Dataset

For IP to Privacy Detection data, we classify the anonymous IP addresses into 5 categories.

  • VPN
  • Proxy
  • Tor
  • Relay
  • Hosting

Our IP privacy detection model is based on a number of different data sources and methodologies, including IP behavior-based detection, machine learning based classifier based on network attributes, public internet records information, traffic pattern etc.

Tor

For Tor IP addresses, we periodically aggregate the IP addresses that are published as TOR exit nodes. Since our IP to Privacy Detection database is updated daily and TOR exit node data is updated multiple times within an hour, we aggregate the IP addresses we see throughout the day and package the data daily.

Proxy

We have certain internal categories that identify IP addresses as proxy IP addresses. There could be overlap between VPN and hosting IP addresses as well. This information also comes from our proxy service provider coverage.

Relay

Relay IP addresses such as Apple Private Relay, Cisco, Fastly, Google One VPN, etc., come from the internal categorization of certain anonymous IP address ranges.

VPN

We flag VPN IP addresses based on certain network behaviors exhibited by anonymous IP addresses. The data is also supplemented by coverage from our service providers. Our most active effort in privacy detection is dedicated to VPN detection.

Hosting / Data Center / Cloud

Hosting flag is derived from public records information, behavior modeling, and traffic information. We are continuously working on fine-tuning our machine-based classifier to better predict hosting IP addresses, even for IP addresses that are mixed with ISP and hosting services within the same range.

To summarize, our IP to Privacy Detection model is largely based on identifying behavioral attributes shown by certain IP addresses. We then validate our detection models using our vast service provider coverage. While other providers mainly rely on WHOIS records, minimal scanning, and subscribing to a handful of service providers, we, on the other hand, focus heavily on comprehensive research, improving our detection models, and constantly fine-tuning our methodology to provide the most accurate, extensive, and reliable anonymous IP address dataset out there.