The lie in their WHOIS: IP geolocation reporting explored

As an IP geolocation provider, we take the idea of dogfooding to the next level. Aside from using our data in our cybersecurity process, we use the data to purchase servers for our probe network.

If you use IP geolocation data, you think IP geolocation is either guesswork, repurposed public internet data records or essentially a black box. If you are an IPinfo user, you probably know it is mainly ping→RTT data at a global internet scale. The TLDR of the probe network is that we use a bunch of different servers across multiple locations to ping and identify the approximate location of an IP address. As it is based on RTT, it is one of the most reliable ways to identify the location of an IP address.

Buying servers in the right place is one key aspect of building this 600+ server-strong global probe network infrastructure. All the servers need to be distributed strategically, and not all server locations are created equal. IPinfo’s probe network benefits mostly from exotic locations.

Now, when it comes to purchasing the servers, the more exotic they are, the higher the price we are paying. We literally pay several times more for servers in the middle of the Sahara desert or in a Polynesian island than we can get from the dozens of data centers and cloud providers in New York, SF or Amsterdam.

Where are the servers actually located

We repeatedly tell our users to avoid “trust me, bro” rhetoric regarding IP geolocation data. Ping the IPs yourself. Some services will ping an IP address from multiple servers and provide you with a list of response times. In buying these probe servers, that idea still holds. WHOIS data can be falsified misleading, but ping data can not. WHOIS and geofeed data are voluntary, usually unverifiable and self-published documents. Yet, some other providers take it as an all-encompassing universal truth. In the age of CDNs and VPNS, server vendors have an interest in falsifying the location information of their IP ranges. And the other IP geolocation providers are eating all that up.

Why are there geolocation discrepancies between IPinfo’s data and other providers? - Docs / Knowledgebase - IPinfo Community

An IP geolocation provider must provide reliable location data of an IP address. Still, relying absolutely on self-published location information from the IP range owners defeats the idea of fairness in the data they provide.

As the industry leader regarding accuracy, IPinfo is in a peculiar position as many IP geolocation providers blindly follow the self-published voluntary internet records. At the same time, we report IP geolocation based on evidence that many providers do not.

Amsterdam or Amsterdamn


We will come back to this diagram. We will add the label.

When buying a server in Europe, the cheapest location is often in Amsterdam, NL. Amsterdam hosts many hosting providers and data centers. However, Europe is a diverse continent, and our probe network prefers diversity.


IPinfo Netherlands IP country page

Now, let’s take the perspective of a hosting provider in a small European city. They know it is very hard to beat Amsterdam-based hosting providers as they are cheaper, and their entire infrastructure makes starting a data center efficient and optimized. For most customers, the location of the server does not matter. Moreover, in Western Europe, countries are packed so tightly together that the country where a server is located does not significantly impact website speed. So, the idea is that if you are in a smaller city, there is very little incentive to start a server hosting business as the data centers are clustered in major cities across Europe.

However, location diversity is a genuine incentive for VPN and hosting resellers. Hosting resellers who receive commissions based on the servers they sell must give something extra to the customer. A customer can always buy the server directly from the original server vendor, and it is often cheaper. But it is very hard to stand out in this industry with identical products.

From this need comes the idea of location diversity. So, the hosting reseller may get the server in a major city from a bigger provider and change the location of the IP range to any city you want. It is as simple as asking the AS owner to change the WHOIS and geofeed records of a particular prefix. They update the location information, and as a reseller, you can charge more for the same server and the IP geolocation services using WHOIS data act as a mediator without commission and vouch for the stated server location.

From an Amsterdam server, they can make the server be located anywhere. Truly Amsterdamn.

The server on the other side of the continent

You might think it all sounds too much. If you think that, you are the type of user we want—the skeptical kind because we are also skeptical and always preach about evidence.

Now, let’s go through our recent experience. We came across a server vendor that showcased servers in 3 to 4 target locations for our probe network. Even though the first red flag should have been the cheap pricing, and the second red flag should have been that they were not responding to our emails inquiring about their services.

We signed up and got the IP address, and as soon as we SSHed into the server, we immediately ran curl ipinfo.io.

Before I saw the IP location fields, my eyes got stuck on the org field. I knew I had been duped within microseconds of seeing the API response. We know that certain AS organizations are notorious for having a track record of bad WHOIS/geofeed location information. This was one of them. The hosting provider was a reseller of their servers. Before I could reason with myself, I saw the country’s name. It was on the other side of the continent from the advertised location…

Within minutes of purchasing a server and paying the monthly payment in advance, the server was essentially worthless to us.

The proof is in the ping

Check multiple IP geolocation provider

First, check IPinfo. Aside from reliable data, we sometimes mention whether another provider’s location information is accurate.

Then, review your list of reliable IP geolocation providers you trust to see where the IP could be located. Some services, like iplocation.net, check-host.net, etc, aggregate multiple IP geolocation data information into one page. So, in this case, ALL the IP geolocation providers thought the server was in the advertised location except for us.

Can you figure out which API response is coming from IPinfo? [Source: iplocation.net]

Seeing us being the odd one was odd. But for verification, you need to know where everyone stands. Even though we ask you to trust us, just be aware of the situation before we show you the evidence.

And about that, “Once upon a time, this IP address belonged to another data center” doesn’t fly with us as we, as an IP data provider, can access our historical data. The server was always in Amsterdam and is still there.

Ping the IP address

Okay, let’s go through the location verification process. Ping the IP address from a service that uses multiple servers. You can ping.sx, check-host.net etc. Then, sort the table by lowest average response time.

The server we purchased was pingable. The lowest average ping time came from the servers in and around Amsterdam. And there is the proof.

Pinged from multiple sources and sorted the data by average ping time [Source: ping.sx]

Recognizing the limitation of ping-based geolocation

One point to note is that pinging IP addresses can generally provide a good context around geolocation information. However, it is not absolutely foolproof. There are some legitimate reasons why ping-based geolocation could disagree with the location information provided by the network operator. This usually happens with CG-NAT.

Also, if the user is tunneling their traffic through a VPN, ping information can only provide geolocation context on the surface-level IP address and not individual user IP addresses behind a VPN, proxy, TOR etc. Ping gives the location of the infrastructure.

However, in the context of this article, where we are trying to identify the server location of a hosting provider, ping-based IP geolocation is quite reliable.

Checking out the WHOIS data

From the IPinfo IP data page, you should first visit the IP range page. The IP range page sits between the AS page and the IP address page, showing the IP addresses within the range but, more importantly, internet data records-based information. This page contains data parsed from the WHOIS and all WHOIS records for the range.

On this page, the IP range’s organizational country is listed. But that is not the same as where the IP addresses are located within that range. Luckily, you can visit a few IP addresses yourself from that page.

Or you can just evaluate the entire range. To do that, summarize the entire IP page.

And guess what? All the IPs are located outside of the WHOIS organization location or the advertised location of the IP addresses.

Taking a closer look at this AS

We have a policy of not giving a “reputation score” to IP addresses or calling out any organization, but we believe in education about good data. Know what you are dealing with and make your own judgment. Even though it could make business sense for us to tell everyone what IPs are good or bad, fraud or no fraud, abuse or no abuse, etc. We prefer to let our users decide what is good for them by providing reliable information.

Let’s take a closer look at the AS organization of this hosting provider who owns the IP address. This is where we are using the IPinfo ASN API. To run this investigation, you can also use our free IP to ASN database.

https://ipinfo.io/AS4*****?token=<TOKEN>

IPinfo ASN API request

First prefix country WHOIS (advertised) location

The ASN prefix key provides a list of objects that contains the IP range prefixes of the ASN, IP range size, country, and more information. For this review, we are only interested in the country aspect of these ranges.

Python Code
import requests
url = "https://ipinfo.io/AS4*****?token=<TOKEN>"
data = requests.get(url).json()

prefixes = data['prefixes']
prefix_number = len(prefixes) #1450

countries = [prefix['country'] for prefix in prefixes]
unique_prefix_countries = len(set(countries)) #68

The AS organization has 1450 IPv4 prefixes with distinct prefix WHOIS country locations of 68. Again, note that this is WHOIS or a self-reported location, and each prefix contains hundreds of IP addresses each. We will come back to the actual locations of these IPs next.

The actual locations of those prefixes

Now, I will enrich all the IPv4 IP addresses listed by this particular ASN with our IP geolocation data (you can also use the free IP to Country database). By combining the ASN information that contains WHOIS (advertised) location and IP geolocation data, I get this table:

The columns starting with the word prefix come from the ASN/WHOIS data, and the columns starting with the word loc come from our IP geolocation data. If you look closely at the image, you can already see the problem.

We will try to get summary statistics of the discrepancies/mismatches between ASN/WHOIS/reported location vs IP geolocation.

SQL Query
SELECT
    percentage_mismatch,
    COUNT(*) AS prefix_mismatch_count
FROM (
    SELECT
        prefix_netblock,
        COUNT(*) AS total_count,
        SUM(CASE WHEN prefix_country <> loc_country THEN 1 ELSE 0 END) AS mismatch_count,
        (SUM(CASE WHEN prefix_country <> loc_country THEN 1 ELSE 0 END) / COUNT(*)) * 100 AS percentage_mismatch
    FROM
        "as_data.csv"
    GROUP BY
        prefix_netblock    
)
GROUP BY percentage_mismatch
ORDER BY prefix_mismatch_count DESC;

image

Well, that isn’t very reassuring. The AS organization is more right than they are wrong. 100.0 means none of the IPs within the range agree with IP geolocation, and 0.0 means all the IPs within the range agree with the IP geolocation. Essentially, what this query is doing:

  • Grouping by prefix_netblock or parent IP blocks
  • Then, comparing the AS/WHOIS/Advertised country locations vs IP geolocation country
  • Counting for mismatches
  • Assigning a percentage value to mismatches
  • Grouping by the percentage mismatches
  • And showing the count

To simplify entire ranges, they provide entirely inaccurate data about the location information of the IP address. These ranges consist of hundreds of IP addresses, and for 57% of these ranges, they reported all inaccurate locations.

And, breaking this information down by total individual IP addresses, we get the mismatch information to be 58.93%, which is surprisingly close to the number based on prefixes.

SQL Query
SELECT
    SUM(
        CASE WHEN prefix_country != loc_country THEN 1 ELSE 0 END
    ) AS mismatch_count,
    (SUM(CASE WHEN prefix_country != loc_country THEN 1 ELSE 0 END) / COUNT(*))
    * 100 AS percentage_mismatch
FROM
    "as_data.csv"

Servers in one location being declared in multiple locations

Looking into the matter more closely, we found that servers in one IP geolocated country are being self-reported to multiple countries.

SQL Query
SELECT
    loc_country as geolocated_location,
    count(DISTINCT prefix_country) -1 as reported_location_count -- -1 for removing valid reporting
FROM
    "as_data.csv"
GROUP BY loc_country
ORDER BY reported_location_count DESC
LIMIT 5
``
Server IP geolocation Number of countries mentioned in prefix WHOIS
1. NL (Netherlands) 27
2. GB (Great Britain) 20
3. US (United States) 15
4. FR (France) 11
5. IT (Italy) 5

That is where Amsterdamn comes in:

Image created using sankeymatic

SQL Query
SELECT
    -- string formatting for Sankey Matic
    CONCAT(loc_country, ' [', COUNT(*), '] ', prefix_country) AS target
FROM
    "as_data.csv"
WHERE loc_country='NL' 
AND prefix_country != 'NL' -- focusing on location mismatch information only
GROUP BY
    prefix_country, loc_country;

Flipping this idea upside down, we also look into the most desired country locations for this particular AS organization.

SQL Query
SELECT
    prefix_country as target_country,
    count(DISTINCT loc_country) - 1 as prefixes_from_another_country_countries
FROM
    "as_data.csv"
GROUP BY target_country
ORDER BY prefixes_from_another_country_countries DESC
LIMIT 5

What is the most desired target server/IP location

We can also flip this relationship and find which prefix country location is being mentioned from different IP geolocation countries. We can see the surprising popularity of the country of Cypress. So, if you buy a server in Cypress, just be cautious. The server may not be located where you think it is located.

An attempt to break down this relationship. This does not indicate the location reporting directions.

Tool used: datasmith.org

Not a single IP address located

We can also look at some countries where the AS organization does not have a single IP address yet declare it on their prefix country location.

SQL Query
SELECT
    prefix_country,
    count(*) as count_ips
FROM "as_data.csv"
WHERE prefix_country NOT IN (SELECT DISTINCT loc_country FROM "as_data.csv")
GROUP BY prefix_country
ORDER BY count_ips DESC;

Top 5 prefix country IP range assignment from IP geolocated countries.

SQL Query
SELECT
    CONCAT(loc_country, ' [', COUNT(*), '] ', prefix_country) AS target
FROM
    "as_data.csv"
WHERE prefix_country = 'SG'
OR prefix_country = 'TH'
OR prefix_country = 'ID'
OR prefix_country = 'HR'
OR prefix_country = 'BR'
-- OR prefix_country = 'NZ'
GROUP BY
    prefix_country, loc_country;


The education aspect of IP geolocation

We are so accurate that sometimes we just do not agree with anyone. Surprisingly, the IP geolocation industry is decades old now, yet nobody looks into the reliability of the data. But we would rather be right than agree with the industry.

Buying servers in exotic locations is not for every organization, but inaccurate IP geolocation accuracy deeply impacts day-to-day business processes such as cybersecurity measures. It is truly surprising how falsified location information from hosting goes unnoticed to this extent. And it is even more surprising that third-party geolocation providers blindly verify this information.

This investigation of IP addresses uses a large amount of data to be processed, but you can run most of these queries using our free IP databases. When it comes to IP geolocation, just be conscious. Review a set of verification processes to decide where an IP address may be located.

4 Likes

Great read Abdullah! That reminds me of couple years ago when “North Korea” VPNs were getting shared :grin:

Thanks, Emir. I will do a followup will get into this matter as well :slight_smile: