[IPinfo Lite] How to generate find IP addresses for smaller ASNs

So, I created IPguessr as a meme of sorts, but it is grown to be a very fun project.

I was generating IP addresses randomly and testing them against our API service to eliminate bogon IP addresses.

function getRandomIP() {
  const octet = () => Math.floor(Math.random() * 256);
  return `${octet()}.${octet()}.${octet()}.${octet()}`;
}

Then another challenge was that I was getting some of the bigger ASNs way too frequently, so when I generated these random IP addresses, I would filter them by ASN:

But I still got way too many common ASNs. For a 5 IP per round game, it felt like these IP addresses came very frequently.

Using the IPinfo Lite DB to see what ASN’s I could exclude

Let’s first look at the ASNs that we can remove from the Lite data downloads service in Snowflake.

The Lite DB download is our free IP to Country ASN database. The data is currently available as data download and on Snowflake and BigQuery. Please let us know if you would like the data in Databricks, AWS, or elsewhere as well.

:link: Snowflake

Let’s do some vibe-coding (or AI-Assited coding) to generate the SQL query using Snowflake CoPilot Cortex AI:

Prompt:

The count_ip UDF takes start_ip and end_ip as input. Use start_ip LIKE ‘%.%’ to filter IPv4 addresses. From all active IPv4 addresses (type != ‘inactive’), group by asn, as_name, and as_domain (excluding NULL ASNs) from the lite table. Count how many IPs each ASN owns and calculate their percentage of the total. Return the top 100 ASNs, ordered by IP count.

/* Generated by Snowflake Copilot */
WITH ip_counts AS (
  SELECT
    asn,
    as_name,
    as_domain,
    SUM(COUNT_IP (start_ip, end_ip)) AS ip_count,
    SUM(SUM(COUNT_IP (start_ip, end_ip))) OVER () AS total_ips
  FROM
    lite
  WHERE
    start_ip LIKE '%.%'
    AND NOT asn IS NULL
  GROUP BY
    asn,
    as_name,
    as_domain
)
SELECT
  asn,
  as_name,
  as_domain,
  ip_count,
  ROUND((ip_count * 100.0 / total_ips), 4) AS percentage_of_total
FROM
  ip_counts
ORDER BY
  ip_count DESC
LIMIT
  100;

These 100 ASNs represent 58% of all the IP addresses in the Lite table. So, that means you kind of have to make at least 2 API calls to get 1 ASN that is not included here. However, that does not account for the Bogon IPs generated by the random IP function.

FYI, 15% of the IP addresses in the Lite DB have an asn marked as null. This is because we base our ASN information on BGP announcements. These IP addresses were not being announced by any ASN, hence they are null. However, if you are interested in ownership information, you can take a look at our WHOIS data.

Out of curiosity, let’s take a look at a percentile-based distribution of IP counts by ASN

Snowflake SQL query to calculate the cumulative ownership of IP addresses by ASN in percentile bands.

  SELECT *,
         count_ip(start_ip, end_ip) AS ip_count
  FROM lite
  WHERE start_ip LIKE '%.%'
        AND NOT asn IS NULL
),

asn_totals AS (
  SELECT asn, SUM(ip_count) AS total_ips
  FROM active_ipv4
  WHERE asn IS NOT NULL
  GROUP BY asn
),

total_ip_space AS (
  SELECT SUM(total_ips) AS total_ips
  FROM asn_totals
),

ranked_asns AS (
  SELECT 
    asn,
    total_ips,
    SUM(total_ips) OVER (ORDER BY total_ips DESC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
    ROW_NUMBER() OVER (ORDER BY total_ips DESC) AS asn_rank
  FROM asn_totals
),

percentile_cutoffs AS (
  SELECT 
    ranked_asns.asn_rank,
    ranked_asns.running_total,
    ROUND(100.0 * ranked_asns.running_total / total_ip_space.total_ips, 1) AS cumulative_percent
  FROM ranked_asns, total_ip_space
),

bucketed_percentiles AS (
  SELECT MIN(asn_rank) AS min_asns_needed,
         FLOOR(cumulative_percent / 10) * 10 AS percentile
  FROM percentile_cutoffs
  WHERE cumulative_percent <= 100
  GROUP BY FLOOR(cumulative_percent / 10) * 10
)

SELECT percentile, min_asns_needed
FROM bucketed_percentiles
WHERE percentile!=0
ORDER BY percentile;

PERCENTILE MIN_ASNS_NEEDED
10 2
20 5
30 12
40 28
50 57
60 107
70 216
80 533
90 2258
100 57107

An interesting pattern you can see here is that: From 10% to nearly 80%, the minimum ASN count keeps doubling.

Instead of ASN, what about we choose as_domain?

PERCENTILE MIN_AS_DOMAINS_NEEDED
10 2
20 4
30 8
40 15
50 32
60 65
70 134
80 329
90 1390
100 47566

This is rather interesting, as now we have a much smaller amount of IP metadata to evaluate if we want to filter them out.

To make the game more interesting, we will exclude the top 90% of ASNs from the game, which is represented by 1,390 as_domains (through 2258 ASNs).

WITH active_ipv4 AS (
  SELECT *,
         count_ip(start_ip, end_ip) AS ip_count
  FROM lite
  WHERE start_ip LIKE '%.%'
    AND as_domain is not null
),

as_domain_totals AS (
  SELECT as_domain, SUM(ip_count) AS total_ips
  FROM active_ipv4
  WHERE as_domain IS NOT NULL
  GROUP BY as_domain
),

ranked_as_domains AS (
  SELECT 
    as_domain,
    total_ips,
    ROW_NUMBER() OVER (ORDER BY total_ips DESC) AS as_domain_rank
  FROM as_domain_totals
)

SELECT as_domain 
FROM ranked_as_domains
WHERE as_domain_rank <= 1390
ORDER BY as_domain_rank;

msedge_Q7FykYH89j

In the next community post, we will look into how we can take advantage of the Lite API’s unlimited API requests feature to filter out these ASNs without building an API preprocessor.

1 Like