We launched our IP to residential proxy data at the beginning of this year. Although it is a new database, we have been working on it internally for the entire last year. We have gained a ton of insights into residential proxies, how they operate, and interesting patterns.
I will take this opportunity to explore the database and gather customer feedback on it.
Delivery methods
The residential proxy detection database can be downloaded in the following formats:
- IP data downloads
- CSV
- JSON (ND JSON)
- MMDB
- Parquet
- Platform delivery
- Snowflake
- GCP / BigQuery
- Data push to storage buckets like AWS S3, GCP GCS, Azure Storage Blob, etc.
Most of the queries run here will use our data on Snowflake Marketplace. You can make a request to obtain the data here: Snowflake
Documentation and Schema
The documentation for the Residential proxy database is available here: Documentation for IP to Residential Proxy Database - IPinfo.io
Database schema:
Field Name | Example | Data Type | Description |
---|---|---|---|
ip | 38.222.31.85 | TEXT | IPv4 or IPv6 address associated with a residential proxy. |
service | lightningproxies | TEXT | Name of the residential proxy service. Carrier/mobile services are suffixed with _mobile (e.g., soax_mobile). |
last_seen | 2024-09-07 | TEXT | Last recorded date when the residential proxy IP was active, formatted as YYYY-MM-DD (ISO-8601). Timezone is UTC. |
percent_days_seen | 2 | TEXT | Integer indicating the percentage of days the IP was active in the last 90-day period, reflecting its activity and frequency of IP within a residential proxy pool. |
Column overview
IP
Unlike most IP address databases, we are using single IP addresses as the index column. We used to have IP ranges, and moved to a CIDR-based database this year. However, for this database, the index column consists of single IPs.
IP | SERVICE | LAST_SEEN | PERCENT_DAYS_SEEN |
---|---|---|---|
187.19.XXX.XXX | s*******y | 2025-01-21 | 12 |
2001:ee0:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX | y*************y | 2024-12-06 | 2 |
86.8.XXX.XXX | p*****d | 2025-02-12 | 22 |
2600:1700:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX | p*********s | 2025-01-15 | 1 |
125.163.XXX.XXX | h*******y | 2025-01-12 | 5 |
95.7.XXX.XXX | d******y | 2024-12-05 | 1 |
177.41.XXX.XXX | s*******y | 2025-02-18 | 4 |
73.171.XXX.XXX | y*************y | 2025-02-18 | 3 |
27.23.XXX.XXX | p*****d | 2024-11-25 | 1 |
2602:fe43:XXXX:XXXX:XXXX:XXXX:XXXX:XXXX | t*********y | 2025-01-25 | 1 |
The reason is that residential proxies do not operate in ranges; they operate individually with the own frequency metadata (last_seen
and percent_days_seen
). Clustering them in CIDRs would result in a large number of /32
s and /128
s, bloating the database.
Therefore, we opted to use individual IPs instead.
Service
With the residential proxy database, one interesting thing you debated for a while was the service details.
Even though we can detect hundreds (nearly 200) of different proxy service providers, because the proxy pools are shared between many different vendors, resellers, and rebranded organizations, we decided to stick with one prominent service provider per IP address.
Even though we have the data internally where our ‘service’ field name is an array, for simplicity, we return a string with a single provider’s name for our commercial database.
Top residential proxy provider by IP count
SERVICE | IPV4_COUNT | SERVICE | IPV6_COUNT | SERVICE | TOTAL_COUNT |
---|---|---|---|---|---|
s*******y | 2383233 | d******y | 674906 | s********y | 2383282 |
p*****d | 1973789 | t*********y | 617726 | d*******y | 2123475 |
l*********s | 1639183 | p********e | 582494 | p******d | 1975587 |
d******y | 1448569 | p*******o | 537826 | l**********s | 1639246 |
o****s | 1251672 | o*******y | 511711 | t**********y | 1357831 |
p****y | 1195433 | l******y | 487886 | p*****y | 1308598 |
s*********s | 942863 | t***********y | 449923 | o*****s | 1251702 |
b*******a | 823159 | s********e | 447886 | p*********e | 1109730 |
y*************y | 773270 | k******y | 355613 | s**********s | 1041452 |
t*********y | 740105 | p*********s | 320813 | p********o | 932769 |
Another aspect of the service is that we categorize the mobile/carrier residential proxies providers by adding the suffix “_mobile”.
To carrier/mobile proxy provider by IP count
SERVICE | COUNT |
---|---|
s**x_mobile | 833355 |
i*************s_mobile | 484634 |
g*****y_mobile | 365095 |
r************s_mobile | 322612 |
d*********e_mobile | 232359 |
r*****g_mobile | 223690 |
p*********p_mobile | 223237 |
p*********e_mobile | 218792 |
o*****s_mobile | 115751 |
d*****y_mobile | 94941 |
Number of residential/ISP and mobile/carrier proxies
Total | 38743724 |
---|---|
Mobile Proxy Count | 3212201 |
ISP Proxy Count | 35531523 |
last_seen
This is proxy frequency metadata. The last date the residential proxy IP was active was YYYY-MM-DD
in the UTC timezone. The 90-day window applies here, which means that the oldest last-seen IP value is always going to be 3 months old.
We have the absolute historic data of IPs we have seen over the last year or so since we started building the product, but for our customer database, the timeframe window is limited to 90 days.
Dataset sorted by
last_seen
on Feb 20, 2025
IP | SERVICE | LAST_SEEN | PERCENT_DAYS_SEEN |
---|---|---|---|
105.161.XXX.XXX | i********************e | 2024-11-21 | 1 |
178.222.XXX.XXX | p******d | 2024-11-21 | 1 |
178.45.XXX.XXX | p******d | 2024-11-21 | 1 |
… | … | … | … |
38.137.XXX.XXX | t**********y | 2025-02-20 | 4 |
38.137.XXX.XXX | y**************y | 2025-02-20 | 2 |
38.137.XXX.XXX | l*******y | 2025-02-20 | 3 |
percent_days_seen
Another proxy frequency metadata. The percent_days_seen
column describes, on a percentage basis (1-100), the integer value of how many times an IP address was seen over the last 90 days. The percentage value is returned as an integer rounded.
Due to how residential proxy data is used/churned, the most frequent value for percent_days_seen
is 1 within the 90 day timeframe.
Random queries
I was curious about how residential proxies are being used, how they are distributed, and what the most popular ASNs are. However, to keep things simple, I will run these queries on the IPv4 residential proxy IPs.
CREATE TEMP TABLE ip_proxy_residential_ipv4 AS
SELECT *
FROM ip_proxy_residential
WHERE ip LIKE '%.%';
Population of residential proxy countries
Powered by our IP location database.
# | COUNTRY | COUNT_IP |
---|---|---|
1 | Brazil | 4373872 |
2 | United States | 3073021 |
3 | Russia | 1932029 |
4 | India | 1713312 |
5 | Vietnam | 1478303 |
6 | Great Britain | 902341 |
7 | Morocco | 764127 |
8 | South Africa | 691660 |
9 | Mexico | 654287 |
10 | Turkey | 624651 |
How many residential proxies operate in the largest ASNs
Powered by our ASN database.
# | ASN | NAME | COUNTRY | COUNT_IP_ASN | RES_IP_COUNT | ASN_RES_PERCENT |
---|---|---|---|---|---|---|
1 | AS4134 | CHINANET-BACKBONE | CN | 103919104 | 213032 | 0.205% |
2 | AS7018 | AT&T Services, Inc. | US | 90563584 | 325457 | 0.359% |
3 | AS7922 | Comcast Cable Communications, LLC | US | 69625600 | 604119 | 0.868% |
4 | AS4837 | CHINA UNICOM China169 Backbone | CN | 57441536 | 71333 | 0.124% |
5 | AS4766 | Korea Telecom | KR | 46453504 | 49833 | 0.107% |
6 | AS701 | Verizon Business | US | 40575744 | 174185 | 0.429% |
7 | AS17676 | SoftBank Corp. | JP | 36434176 | 18915 | 0.052% |
8 | AS3320 | Deutsche Telekom AG | DE | 34050560 | 131419 | 0.386% |
9 | AS4713 | NTT Communications Corporation | JP | 28691456 | 13611 | 0.047% |
10 | AS3356 | Level 3 Parent, LLC | US | 28098304 | 1763 | 0.006% |
If you have any queries or questions you want me to explore, please let me know. Thanks!