Using our IPinfo’s data on Snowflake through direct upload/ingestion

Snowflake IP enrichment with IPinfo’s standard IP data downloads

This document describes working with IPinfo’s database download in the Snowflake Platform through ingestion/import of the standard database.

This document is not for users who have purchased our data from Snowflake Marketplace.

It is highly recommended that IPinfo’s data should be acquired through Snowflake Marketplace if you intend to use it inside Snowflake. However, we understand that IPinfo’s listings might not be obtainable through the Snowflake Marketplace.

If you have already acquired our IPinfo data through the Snowflake Marketplace, we recommend not to recreate any steps mentioned here. We highly recommend that you stick with the listing packaged UDTF and UDF-based solutions we have packaged with the listing. This document describes the inner mechanism of our Snowflake listings.

1. Data Ingestion of the IPinfo database into Snowflake

We recommend that customers get IPinfo’s CSV format data for easy ingestion in Snowflake.

If you want to explore which data format is the right choice for you, we highly recommend you go through this article: How to choose the best file format for your IPinfo database?

Snowflake’s documentation goes through the details of CSV data loading in Snowflake: Load Data into Snowflake | Snowflake Documentation

Please note the name used in the table, which contains your IPinfo data. We will reference that as the ${ip_database_table_name} further down the document.

2. Setting up IPinfo’s data inside Snowflake

Because of the unique nature of the IP address database, you have to follow certain techniques for an efficient join, lookup and log enrichment process. This technique varies from data platform to data platform based on their unique offerings.

For the Snowflake platform, we recommend replicating the setup we provide for our Snowflake Marketplace Listing.

2.1. The difference between standard IPinfo data downloads and Snowflake Marketplace download

There are certain differences between our standard database downloads and Snowflake marketplace databases.

Standard Database Downloads Snowflake Marketplace Database
Does not include the int/hex values for the IP address for start_ip and end_ip columns Includes the hex/int values of start_ip and end_ip columns as start_ip_int and end_ip_int columns
Has the join_key in IP address format Has the join_key in the hex/int

What are the hex/int values of the IP address? Why do we need them?

In our Snowflake listings, IPv4 addresses are converted to their equivalent integer values and IPv6 addresses are converted to their equivalent hexadecimal values. We have found that pre-converting the IP address to the hex/int values helped us to create the most efficient lookup solutions.

Standard database schema for the IP Geolocation Database (Source)

Field Name Example Data Type Notes
start_ip 1.253.242.0 TEXT Starting IP address of an IP address range
end_ip 1.253.242.255 TEXT Ending IP address of an IP address range
join_key 1.253.0.0 TEXT Special variable to facilitate join operation
city Yangsan TEXT City of the location
region Gyeongsangnam-do TEXT Region of the location
country KR TEXT ISO 3166 country code
latitude 35.34199 FLOAT Latitude value of the location
longitude 129.03358 FLOAT Longitude value of the location
postal_code 50593 TEXT Postal code of the location
timezone Asia/Seoul TEXT Local time zone

Snowflake listing database schema of the IP Geolocation Snowflake listing.

name type Example
START_IP_INT VARIANT 3025182848
END_IP_INT VARIANT 3025184255
JOIN_KEY VARCHAR(16777216) 3025141760
START_IP VARCHAR(16777216) 180.80.160.128
END_IP VARCHAR(16777216) 180.80.165.255
CITY VARCHAR(16777216) Seoul
REGION VARCHAR(16777216) Seoul
COUNTRY VARCHAR(16777216) KR
LAT FLOAT 37.6566
LNG FLOAT 127.0769
POSTAL VARCHAR(16777216) 1711
TIMEZONE VARCHAR(16777216) Asia/Seoul

2.2. Establishing the database name and schema name

In the functions described in the follow-up sections, we mention the database name and schema. You have to establish those parameters yourself based on the Snowflake setup you have.

${database_name} → Database Name

${schema} → Schema

Please review the functions very carefully for the variable namespaces.

2.3. Replicating the Snowflake Marketplace listing downloads

After the loading of the standard IPinfo database in Snowflake, we recommend you convert/create the hex/int columns in your table, which will involve the following:

Standard Column Target Column Action Note
start_ip start_ip_int Create new column The existing column can be useful in understanding IP ranges.
end_ip end_ip_int Create new column The existing column can be useful in understanding IP ranges.
join_key join_key Replace the previous column The join_key column is a specialized column for IP lookups. It could be confusing to create a new column in the format of join_key_int because in all our Snowflake documentation, we expect the join_key column to be in int/hex format.

Note that even though we are using the _inttext in the column, they will contain hex values for the IPv6 IP addresses.

2.4. Converting the IP addresses to int/hex values using TO_INT UDF

Now, to create the hex/int values of the IP address values, you need to create the TO_INT UDF (User Defined Function).

TO_INT UDF

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.TO_INT(ip text) COPY GRANTS
    RETURNS VARIANT
AS \$\$
    IFF(
        ip LIKE '%:%',
        PARSE_IP(ip, 'INET', 1):hex_ipv6,
        AS_INTEGER(PARSE_IP(ip, 'INET', 1):ipv4)
    )
\$\$;

Once you create this function, use it to convert the columns inside the table that contains IPinfo’s data. Convert the “Standard Columns” to their “Target Columns” described in the previous section.

After you have done that, you will now have our Snowflake listing-like database.

2.5. Create the TO_JOIN_KEY function

The TO_JOIN_KEY function will help you with creating join_keys for your input IP addresses. We will use this function in the UDTF declaration.

TO_JOIN_KEY Function code

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.TO_JOIN_KEY(ip text) COPY GRANTS
    RETURNS VARIANT
AS \$\$
    IFF(
        ip LIKE '%:%',
        PARSE_IP(CONCAT(ip, '/34'), 'inet', 1):hex_ipv6_range_start,
        AS_INTEGER(PARSE_IP(CONCAT(ip, '/16'), 'inet', 1):ipv4_range_start)
    )
\$\$;

2.6. Creating the UDTF

The UDTF (User Defined Table Function) is our core IP lookup operation mechanism. We will use this function to perform efficient IP enrichment and joining operations.

If you want to learn about this UDTF we recommend you to check out this article. How to get IP Data in Snowflake (ipinfo.io)

Let’s start with a real life example of the UDTF function. The UDTF of the IP Geolocation database, called IP_LOCATION:

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.IP_LOCATION(ip text) COPY GRANTS
    RETURNS TABLE(city text, region text, country text, postal_code text, latitude float, longitude float, timezone text)
AS \$\$
    SELECT city, region, country, postal_code, latitude, longitude, timezone
    FROM ${database_name}.${schema}.ip_geolocation
    WHERE
        join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
        AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;

Line by line breakdown of the Example UDTF

Code Explanation
${database_name}.${schema}.IP_LOCATION(ip text) The UDTF is called IP_LOCATION and it takes the parameter ip in the text format meaning the IP address format.
You don’t have do any pre-conversion of your input IP address in your log or any sources you might have. Just pass the IP addresses as is.
RETURNS TABLE(city text, [.....] timezone text) This returns the IP metadata information in a table format.
By IP metadata information, we mean the information associated with the IP address ranges in IPinfo’s database.
For IP Geolocation it is city, region, country, postal_code, latitude, longitude, and timezone.
For the ASN database, it is asn, domain, name, type, and country.
We highly recommend you review the database schema of the database you have acquired and the database schema on our documentation.
SELECT city, region, country, postal_code, latitude, longitude, timezone We return the selected IP metadata information according to the table schema.
FROM ${database_name}.${schema}.ip_geolocation Please notice the ip_geolocation part. That is the table name of the IPinfo IP Geolocation database loaded into Snowflake. We used ip_geolocation as the table name, but it can be different for you. It all depends on what name you used for the table where your IPinfo data is stored.
WHERE join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip) AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int The core join operation. This uses both TO_JOIN_KEY and TO_INT functions. The operation refers to the newly created columns: join_key, start_ip_int and end_ip_int. The log enrichment join operation uses both the join key and the range between operations for the lookup operation.

That is the process of the UDTF that backs the IP enrichment process.

IPinfo Lookup UDTF Template

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.${UDTF_name}(ip text) COPY GRANTS
    RETURNS TABLE(${ip_meta_attribute_column_name} ${ip_meta_attribute_column_type})
AS \$\$
    SELECT ${ip_meta_attribute_column_name}
    FROM ${database_name}.${schema}.${ip_database_table_name}
    WHERE
        join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
        AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;

Variables that need to be declared:

  • ${database_name}
  • ${schema}
  • ${UDTF_name}
  • ${ip_meta_attribute_column_name}
  • ${ip_meta_attribute_column_type}

3. Using the UDTF: Joining/Enriching IP addresses with IPinfo’s data

The final step! So far, you have:

  • Processed the standard database to replicate the Snowflake listing data
  • Declared the essential helper functions and UDTF
  • Brought in your input IP addresses

In this section, we will go through how you can pass an IP address and enrich it with IPinfo’s IP database

Protip: we recommend if you have a list of IP addresses, find the distinct/unique IP addresses first. Then, enrich them first with IPinfo’s data through the following process. Then, using the enriched IP data, run the join operation to their original database or log.

Using the IP_LOCATION example UDTF, you need to run the following query.

SELECT *
FROM logs l
JOIN TABLE(ipinfo.public.ip_location(l.ip))

Line-by-line breakdown of the example usage of the join operation

Code Explanation
SELECT * This will return the IP address and their associated information from IPinfo’s example database
FROM logs l The logs table contains your input IP addresses. We used the alias l for the join operation.
JOIN TABLE(ipinfo.public.ip_location(l.ip)) Final join operation. We call the TABLE function to convert the UDTF return to a table, and then we run JOIN over the input IP address from the log.
We pass the IP addresses from the log. Here:
  • ipinfo is ${database_name}
  • public is ${schema}
  • ip_location is ${UDTF_name}

The output table will look like this:

Template for IP data enrichment using the IPinfo Snowflake UDTF

SELECT ${column_containing_ip_addresses}
FROM ${input_table} ${table_alias}
JOIN TABLE(${database_name}.${schema}.${UDTF_name}(${table_alias}.${column_containing_ip_addresses})

Variables that need to be declared:

  • ${column_containing_ip_addresses}
  • ${input_table}
  • ${table_alias}
  • ${database_name}
  • ${schema}
  • ${UDTF_name}

Summary


That is about it. Again, we highly recommend you go through the Snowflake Marketplace listing as in the listings, we not only take care of the database column conversions but provide the UDTF associated with the listing as well as provide a bunch of helpful UDFs.

You can check out our documentation on Snowflake listing to have a technical overview of the provided resources: Snowflake Listing Documentation (ipinfo.io)


Addendum: Function References

TO_INT Function code

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.TO_INT(ip text) COPY GRANTS
    RETURNS VARIANT
AS \$\$
    IFF(
        ip LIKE '%:%',
        PARSE_IP(ip, 'INET', 1):hex_ipv6,
        AS_INTEGER(PARSE_IP(ip, 'INET', 1):ipv4)
    )
\$\$;

TO_JOIN_KEY Function code

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.TO_JOIN_KEY(ip text) COPY GRANTS
    RETURNS VARIANT
AS \$\$
    IFF(
        ip LIKE '%:%',
        PARSE_IP(CONCAT(ip, '/34'), 'inet', 1):hex_ipv6_range_start,
        AS_INTEGER(PARSE_IP(CONCAT(ip, '/16'), 'inet', 1):ipv4_range_start)
    )
\$\$;

IPinfo Lookup UDTF Template

CREATE OR REPLACE SECURE FUNCTION
    ${database_name}.${schema}.${UDTF_name}(ip text) COPY GRANTS
    RETURNS TABLE(${ip_meta_attribute_column_name} ${ip_meta_attribute_column_type})
AS \$\$
    SELECT ${ip_meta_attribute_column_name}
    FROM ${database_name}.${schema}.${ip_database_table_name}
    WHERE
        join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
        AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;

Variables that need to be declared:

  • ${database_name}
  • ${schema}
  • ${UDTF_name}
  • ${ip_meta_attribute_column_name}
  • ${ip_meta_attribute_column_type}