Using IPinfo’s data downloads in PostgreSQL

Using our database in PostgreSQL involves the following steps:

  • Using the IP4R extension for IP address lookups.
  • Create a sample log table.
  • Create the table for the IPinfo data ingestion and ingest the data there.
  • Running the IP query operation.

Step 0 - Create a sample log table

We will create a sample log table for the demo and populate it with random IPv4 addresses. The data type of the IP column is set as text.

The log table consists of random IP addresses. Some of those IP addresses are bogon and will not have location data attached to them.

On psql, we are running:

CREATE TABLE log (
  id bigserial primary key,
  ip text
);

To populate the log table with 100000 random IP addresses, run the following:

INSERT INTO log(ip)
SELECT CONCAT(
  TRUNC(RANDOM() * 250 + 2),'.',
  TRUNC(RANDOM() * 250 + 2),'.',
  TRUNC(RANDOM() * 250 + 2),'.',
  TRUNC(RANDOM() * 250 + 2)
)::text
FROM GENERATE_SERIES(0, 99999);

image

Step 1 - Using the IP4R extension

To get started, you need to use the IP4R extension.

CREATE EXTENSION ip4r;

:link: PostgreSQL: Using and installing the IP4R extension

:link: PostgreSQL: Why use the IP4R extension and not Postgres’ network address types?

Step 2 - Create the table for the IPinfo data

We declare the IP address columns (start_ip, end_ip and join_key ) as ipaddress data type that comes from the IP4R extension. We generate an ip_range column as iprange format from the start_ip and end_ip columns.

I am using the IP geolocation database as an example for this guide.

-- Create the table for the IP to Geolocation data
CREATE TABLE ip_location (
  start_ip    ipaddress,
  end_ip      ipaddress,
  join_key    ipaddress,
  ip_range    iprange generated always as (iprange(start_ip, end_ip)) stored,
  city        text,
  region      text,
  country     text,
  latitude    text,
  longitude   text,
  postal_code text,
  timezone    text
);

The database schema for the data downloads is available here: Database Types - IPinfo.io

If you are using the extended databases, you can find the schema for them here: GitHub - ipinfo/sample-database: IPinfo database and API sample repository

You can check out this post where we have CREATE TABLE queries for all our databases: PostgreSQL: `CREATE TABLE` and `CREATE FUNCTION` (UDF) queries for IPinfo data downloads

Step 3 - Ingesting the data download in the Postgres table

You can use the following bash command to:

  • Download IP Geolocation database in the zipped CSV format
  • Unzip to get the CSV data
  • Ingest the CSV data into the IP_LOCATION table via PSQL

Please set the PostgreSQL connection string here:
postgresql://username:password@database:5432/postgres

Please set up the IPinfo token here:
<YOUR_TOKEN>

curl -sL https://ipinfo.io/data/location.csv.gz?token=<YOUR_TOKEN> 2>&1 | gzip -d | psql postgresql://username:password@database:5432/postgres -c "COPY ip_location FROM STDIN DELIMITER ',' CSV HEADER"

Step 4 - Indexing the IP data table by IP ranges

We can create the index for the IP_LOCATION table. Then we can update the log table with the location data.

-- Setting up the index for the IP_LOCATION table
-- This might take a while
CREATE INDEX ipl_ip_range
ON ip_location
USING gist (ip_range);

Step 5 - Running the IP query operation

On our lookup operation, we are casting the data types of IP addresses in the log as ipaddress data type from the IP4R extension.

-- Create a joined table from the IP addresses from the log table
-- with the IP to Location insights from the IP_LOCATION table
SELECT *
FROM log l
JOIN ip_location ipl
ON ipl.ip_range && l.ip::ipaddress; -- casting ip as ipaddress data type

This will output a joined data table containing the log’s IP addresses and their corresponding IP location data from the IP to Location database.

If for some reason the query operation is taking more than a few seconds, try running the command: SET enable_seqscan = OFF;
Postgres doc

Optional Step - Outputting only the relevant IP data columns from the IP query operation

To clean up the output table a bit, call the individual data columns instead of running the SELECT statement with *.

SELECT
  l.ip,
  ipl.city,
  ipl.region,
  ipl.country,
  ipl.latitude,
  ipl.longitude,
  ipl.postal_code,
  ipl.timezone
FROM log l
JOIN ip_location ipl
ON ipl.ip_range && l.ip::ipaddress;

2 Likes