The problem with nested and irregular data structures in IP databases

IPinfo’s flat and tabular data structure helps keep codebases clean, and maintainable.

The issue with databases of legacy providers

In a programming environment, one of the most valuable features is consistency, especially consistent input. When it comes to data used as a reference or a lookup database, any form of inconsistency or unpredictability introduces a significant probability of bugs. This holds true for IP databases as well. If an IP database is structured in a nested data structure and lacks consistency, it introduces a significant number of bugs in the code, maintenance issues, and impacts the code’s readability.

A nested data structure with unpredictable fields introduces reference errors or index errors. Considering these issues, programmers need to write try-catch error statements on each data field lookup to avoid index errors. Moreover, if you mix list/array values in a lookup database, it creates a substantial amount of complexity as most programmers don’t know how many list elements there are prior to looking into it.

This issue continues as, in some instances, the plaintext format and the binary format data of the same IP database return different types of responses in terms of lookup results. There is no uniformity of response across multiple file formats of the same IP database.

IPinfo’s Consistent Data Structures

IPinfo takes this issue into account. All of IPinfo’s data downloads are tabular and flat. The data is not nested, and all the data field information is available at the primary level.

IPinfo’s data download also does not omit information fields when there is no corresponding value. If there is no value available for a field, we simply return an empty string.

Another aspect is that IPinfo’s data downloads do not have a mix of objects and lists in its lookup response. When you lookup an IP address, the data response is consistent across both CSV and MMDB file formats. It is flat and tabular data.

Taking a closer look

Now, let’s take a closer look and compare the responses of IPinfo’s IP database and the IP database of a legacy provider. We will run this experiment using our IP geolocation database. We will evaluate the following fields:

  • City
  • Region
  • Geographic coordinates
  • Zip Code
  • Country
  • Timezone

I will be using Python for this experiment. With IPinfo’s database, the code to get specific field infromation is quite simple:

ip_data['city'] # city
ip_data['country'] # country
ip_data['latitude'] # latitude
ip_data['longitude'] # longitude
ip_data['postal_code'] # postal_code
ip_data['region'] # region
ip_data['timezone'] # timezone

On the legacy provider’s database this code becomes:

ip_data['city']['names']['en'] # city
ip_data['country']['iso_code'] # country
ip_data['location']['latitude'] # latitude
ip_data['location']['longitude'] # longitude
ip_data['postal']['code'] # postal_code
ip_data['subdivisions'][0]['names']['en'] # region
ip_data['location']['time_zone'] # timezone

Off the bat, there are some crucial issues with the legacy provider’s data:

  • Sometimes, the first-level data is not available. This is common with fields like city, postal_code, etc.
  • There is a list element in the database that requires accessing the data through list indexing. Specifically, the region/subdivision field is a list element.
  • When a value is not present, the field itself does not exist.

Now, if you lookup the IP address:

IPinfo returns:

{'city': 'University City',
 'country': 'US',
 'latitude': '38.65588',
 'longitude': '-90.30928',
 'postal_code': '63105',
 'region': 'Missouri',
 'timezone': 'America/Chicago'}

However, the legacy provide fails to provide information on specific fields such as:

KeyError: 'city'
KeyError: 'postal_code'
KeyError: 'subdivision'

Even if they don’t have the data, it is not a major problem. The issue is that their object structure is so unpredictable that it is throwing a key missing error and stopping the program.

Now, let’s see another example. For the IP address, both IPinfo and the legacy provider do not have information about the postal code.

IPinfo returns an empty string for the postal code field. It does not omit the postal code. The IP lookup response is:

{'city': 'Kafr ash Shaykh',
 'country': 'EG',
 'latitude': '31.11174',
 'longitude': '30.93991',
 'postal_code': '',
 'region': 'Kafr el-Sheikh',
 'timezone': 'Africa/Cairo'}

While the legacy provider, throws us an key error:

KeyError: 'postal'

These are just a couple of examples, and these issues scale up really fast. So, if you want to write functional code for working with a legacy provider, you must write code like this ↓

And to be honest, it does not look like good code. Yet, you have to work with it if you choose the legacy provider’s data. With IPinfo, you don’t even have to bother with any of this and can use a specific key to look up the value from the response object.