Snowflake IP enrichment with IPinfo’s standard IP data downloads
This document describes working with IPinfo’s database download in the Snowflake Platform through ingestion/import of the standard database.
This document is not for users who have purchased our data from Snowflake Marketplace.
It is highly recommended that IPinfo’s data should be acquired through Snowflake Marketplace if you intend to use it inside Snowflake. However, we understand that IPinfo’s listings might not be obtainable through the Snowflake Marketplace.
If you have already acquired our IPinfo data through the Snowflake Marketplace, we recommend not to recreate any steps mentioned here. We highly recommend that you stick with the listing packaged UDTF and UDF-based solutions we have packaged with the listing. This document describes the inner mechanism of our Snowflake listings.
1. Data Ingestion of the IPinfo database into Snowflake
We recommend that customers get IPinfo’s CSV format data for easy ingestion in Snowflake.
If you want to explore which data format is the right choice for you, we highly recommend you go through this article: How to choose the best file format for your IPinfo database?
Snowflake’s documentation goes through the details of CSV data loading in Snowflake: Load Data into Snowflake | Snowflake Documentation
Please note the name used in the table, which contains your IPinfo data. We will reference that as the
${ip_database_table_name}
further down the document.
2. Setting up IPinfo’s data inside Snowflake
Because of the unique nature of the IP address database, you have to follow certain techniques for an efficient join, lookup and log enrichment process. This technique varies from data platform to data platform based on their unique offerings.
For the Snowflake platform, we recommend replicating the setup we provide for our Snowflake Marketplace Listing.
2.1. The difference between standard IPinfo data downloads and Snowflake Marketplace download
There are certain differences between our standard database downloads and Snowflake marketplace databases.
Standard Database Downloads | Snowflake Marketplace Database |
---|---|
Does not include the int/hex values for the IP address for start_ip and end_ip columns |
Includes the hex/int values of start_ip and end_ip columns as start_ip_int and end_ip_int columns |
Has the join_key in IP address format |
Has the join_key in the hex/int |
What are the hex/int values of the IP address? Why do we need them?
In our Snowflake listings, IPv4 addresses are converted to their equivalent integer values and IPv6 addresses are converted to their equivalent hexadecimal values. We have found that pre-converting the IP address to the hex/int values helped us to create the most efficient lookup solutions.
Standard database schema for the IP Geolocation Database (Source)
Field Name | Example | Data Type | Notes |
---|---|---|---|
start_ip | 1.253.242.0 | TEXT | Starting IP address of an IP address range |
end_ip | 1.253.242.255 | TEXT | Ending IP address of an IP address range |
join_key | 1.253.0.0 | TEXT | Special variable to facilitate join operation |
city | Yangsan | TEXT | City of the location |
region | Gyeongsangnam-do | TEXT | Region of the location |
country | KR | TEXT | ISO 3166 country code |
latitude | 35.34199 | FLOAT | Latitude value of the location |
longitude | 129.03358 | FLOAT | Longitude value of the location |
postal_code | 50593 | TEXT | Postal code of the location |
timezone | Asia/Seoul | TEXT | Local time zone |
Snowflake listing database schema of the IP Geolocation Snowflake listing.
name | type | Example |
---|---|---|
START_IP_INT | VARIANT | 3025182848 |
END_IP_INT | VARIANT | 3025184255 |
JOIN_KEY | VARCHAR(16777216) | 3025141760 |
START_IP | VARCHAR(16777216) | 180.80.160.128 |
END_IP | VARCHAR(16777216) | 180.80.165.255 |
CITY | VARCHAR(16777216) | Seoul |
REGION | VARCHAR(16777216) | Seoul |
COUNTRY | VARCHAR(16777216) | KR |
LAT | FLOAT | 37.6566 |
LNG | FLOAT | 127.0769 |
POSTAL | VARCHAR(16777216) | 1711 |
TIMEZONE | VARCHAR(16777216) | Asia/Seoul |
2.2. Establishing the database name and schema name
In the functions described in the follow-up sections, we mention the database name and schema. You have to establish those parameters yourself based on the Snowflake setup you have.
${database_name}
→ Database Name
${schema}
→ Schema
Please review the functions very carefully for the variable namespaces.
2.3. Replicating the Snowflake Marketplace listing downloads
After the loading of the standard IPinfo database in Snowflake, we recommend you convert/create the hex/int columns in your table, which will involve the following:
Standard Column | Target Column | Action | Note |
---|---|---|---|
start_ip |
start_ip_int |
Create new column | The existing column can be useful in understanding IP ranges. |
end_ip |
end_ip_int |
Create new column | The existing column can be useful in understanding IP ranges. |
join_key |
join_key |
Replace the previous column | The join_key column is a specialized column for IP lookups. It could be confusing to create a new column in the format of join_key_int because in all our Snowflake documentation, we expect the join_key column to be in int/hex format. |
Note that even though we are using the
_int
text in the column, they will contain hex values for the IPv6 IP addresses.
2.4. Converting the IP addresses to int/hex values using TO_INT
UDF
Now, to create the hex/int values of the IP address values, you need to create the TO_INT
UDF (User Defined Function).
TO_INT
UDF
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.TO_INT(ip text) COPY GRANTS
RETURNS VARIANT
AS \$\$
IFF(
ip LIKE '%:%',
PARSE_IP(ip, 'INET', 1):hex_ipv6,
AS_INTEGER(PARSE_IP(ip, 'INET', 1):ipv4)
)
\$\$;
Once you create this function, use it to convert the columns inside the table that contains IPinfo’s data. Convert the “Standard Columns” to their “Target Columns” described in the previous section.
After you have done that, you will now have our Snowflake listing-like database.
2.5. Create the TO_JOIN_KEY
function
The TO_JOIN_KEY
function will help you with creating join_keys for your input IP addresses. We will use this function in the UDTF declaration.
TO_JOIN_KEY
Function code
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.TO_JOIN_KEY(ip text) COPY GRANTS
RETURNS VARIANT
AS \$\$
IFF(
ip LIKE '%:%',
PARSE_IP(CONCAT(ip, '/34'), 'inet', 1):hex_ipv6_range_start,
AS_INTEGER(PARSE_IP(CONCAT(ip, '/16'), 'inet', 1):ipv4_range_start)
)
\$\$;
2.6. Creating the UDTF
The UDTF (User Defined Table Function) is our core IP lookup operation mechanism. We will use this function to perform efficient IP enrichment and joining operations.
If you want to learn about this UDTF we recommend you to check out this article. How to get IP Data in Snowflake (ipinfo.io)
Let’s start with a real life example of the UDTF function. The UDTF of the IP Geolocation database, called IP_LOCATION
:
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.IP_LOCATION(ip text) COPY GRANTS
RETURNS TABLE(city text, region text, country text, postal_code text, latitude float, longitude float, timezone text)
AS \$\$
SELECT city, region, country, postal_code, latitude, longitude, timezone
FROM ${database_name}.${schema}.ip_geolocation
WHERE
join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;
Line by line breakdown of the Example UDTF
Code | Explanation |
---|---|
${database_name}.${schema}.IP_LOCATION(ip text) |
The UDTF is called IP_LOCATION and it takes the parameter ip in the text format meaning the IP address format. You don’t have do any pre-conversion of your input IP address in your log or any sources you might have. Just pass the IP addresses as is. |
RETURNS TABLE(city text, [.....] timezone text) |
This returns the IP metadata information in a table format. By IP metadata information, we mean the information associated with the IP address ranges in IPinfo’s database. For IP Geolocation it is city , region , country , postal_code , latitude , longitude , and timezone . For the ASN database, it is asn , domain , name , type , and country . We highly recommend you review the database schema of the database you have acquired and the database schema on our documentation. |
SELECT city, region, country, postal_code, latitude, longitude, timezone |
We return the selected IP metadata information according to the table schema. |
FROM ${database_name}.${schema}.ip_geolocation |
Please notice the ip_geolocation part. That is the table name of the IPinfo IP Geolocation database loaded into Snowflake. We used ip_geolocation as the table name, but it can be different for you. It all depends on what name you used for the table where your IPinfo data is stored. |
WHERE join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip) AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int |
The core join operation. This uses both TO_JOIN_KEY and TO_INT functions. The operation refers to the newly created columns: join_key , start_ip_int and end_ip_int . The log enrichment join operation uses both the join key and the range between operations for the lookup operation. |
That is the process of the UDTF that backs the IP enrichment process.
IPinfo Lookup UDTF Template
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.${UDTF_name}(ip text) COPY GRANTS
RETURNS TABLE(${ip_meta_attribute_column_name} ${ip_meta_attribute_column_type})
AS \$\$
SELECT ${ip_meta_attribute_column_name}
FROM ${database_name}.${schema}.${ip_database_table_name}
WHERE
join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;
Variables that need to be declared:
${database_name}
${schema}
${UDTF_name}
${ip_meta_attribute_column_name}
${ip_meta_attribute_column_type}
3. Using the UDTF: Joining/Enriching IP addresses with IPinfo’s data
The final step! So far, you have:
- Processed the standard database to replicate the Snowflake listing data
- Declared the essential helper functions and UDTF
- Brought in your input IP addresses
In this section, we will go through how you can pass an IP address and enrich it with IPinfo’s IP database
Protip: we recommend if you have a list of IP addresses, find the distinct/unique IP addresses first. Then, enrich them first with IPinfo’s data through the following process. Then, using the enriched IP data, run the join operation to their original database or log.
Using the IP_LOCATION
example UDTF, you need to run the following query.
SELECT *
FROM logs l
JOIN TABLE(ipinfo.public.ip_location(l.ip))
Line-by-line breakdown of the example usage of the join operation
Code | Explanation |
---|---|
SELECT * |
This will return the IP address and their associated information from IPinfo’s example database |
FROM logs l |
The logs table contains your input IP addresses. We used the alias l for the join operation. |
JOIN TABLE(ipinfo.public.ip_location(l.ip)) |
Final join operation. We call the TABLE function to convert the UDTF return to a table, and then we run JOIN over the input IP address from the log. We pass the IP addresses from the log. Here:
|
The output table will look like this:
Template for IP data enrichment using the IPinfo Snowflake UDTF
SELECT ${column_containing_ip_addresses}
FROM ${input_table} ${table_alias}
JOIN TABLE(${database_name}.${schema}.${UDTF_name}(${table_alias}.${column_containing_ip_addresses})
Variables that need to be declared:
${column_containing_ip_addresses}
${input_table}
${table_alias}
${database_name}
${schema}
${UDTF_name}
Summary
That is about it. Again, we highly recommend you go through the Snowflake Marketplace listing as in the listings, we not only take care of the database column conversions but provide the UDTF associated with the listing as well as provide a bunch of helpful UDFs.
You can check out our documentation on Snowflake listing to have a technical overview of the provided resources: Snowflake Listing Documentation (ipinfo.io)
Addendum: Function References
TO_INT
Function code
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.TO_INT(ip text) COPY GRANTS
RETURNS VARIANT
AS \$\$
IFF(
ip LIKE '%:%',
PARSE_IP(ip, 'INET', 1):hex_ipv6,
AS_INTEGER(PARSE_IP(ip, 'INET', 1):ipv4)
)
\$\$;
TO_JOIN_KEY
Function code
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.TO_JOIN_KEY(ip text) COPY GRANTS
RETURNS VARIANT
AS \$\$
IFF(
ip LIKE '%:%',
PARSE_IP(CONCAT(ip, '/34'), 'inet', 1):hex_ipv6_range_start,
AS_INTEGER(PARSE_IP(CONCAT(ip, '/16'), 'inet', 1):ipv4_range_start)
)
\$\$;
IPinfo Lookup UDTF Template
CREATE OR REPLACE SECURE FUNCTION
${database_name}.${schema}.${UDTF_name}(ip text) COPY GRANTS
RETURNS TABLE(${ip_meta_attribute_column_name} ${ip_meta_attribute_column_type})
AS \$\$
SELECT ${ip_meta_attribute_column_name}
FROM ${database_name}.${schema}.${ip_database_table_name}
WHERE
join_key = ${database_name}.${schema}.TO_JOIN_KEY(ip)
AND ${database_name}.${schema}.TO_INT(ip) BETWEEN start_ip_int AND end_ip_int
\$\$;
Variables that need to be declared:
${database_name}
${schema}
${UDTF_name}
${ip_meta_attribute_column_name}
${ip_meta_attribute_column_type}