Querying IPinfo IP databases inside BigQuery
We highly recommend that users not write their own code when querying our data inside BigQuery. These queries are often unoptimized and thus expensive.
IP databases are specialized and require specific SQL queries depending on the platform. These queries vary from platform to platform as they require functions and features that can support IP address data. For BigQuery, we highly recommend following this tutorial as it leverages BigQuery’s native functions.
Step 0: Prerequisites
This script assumes you have already ingested the IPinfo IP database in Big Query. Documentation: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv.
- IPinfo IP database ingested in BigQuery
- A table, view or CTF containing IP addresses
Step 1: Exploring the log data containing the IP addresses
Start with a table, view or a CTF containing IP addresses. The IP addresses used in this example are contained in the log_data table. The table has one column called IP.
SELECT
*
FROM
log_data
log_data
We will join the IP addresses with the IPinfo metadata information in the IPinfo IP database.
Step 2: The IPinfo IP database inside BigQuery
We recommend you ingest the CSV format data into BigQuery for this. For this, follow the instructions available here: https://cloud.google.com/bigquery/docs/loading-data-cloud-storage-csv.
The IPinfo database we will use here is the IP to Geolocation database.
SELECT
*
FROM
`ipinfo-XXXX.ipinfo_data.location`
LIMIT
100
Here:
| Item | Note |
|---|---|
ipinfo-XXXX |
GCP project name |
ipinfo_data |
Resource name |
location |
Table name |
We also recommend setting up indexing or clustering mechanism. Here is a blog article from GCP: Optimizing BigQuery: Cluster your tables | by Felipe Hoffa | Medium. If you index the IPinfo IP database, your target columns will be the IP range columns: start_ip, end_ip and join_key.
Step 3: Running the query
Step 3.1: Identify IP metadata information from the IPinfo IP database
The IP metadata information will be any column containing no IP address range (start_ip, end_ip or join_key). The IP metadata information in the IP to Geolocation database will be the following:
cityregioncountrylatlongpostaltimezone
Step 3.2: The lookup code
This is the main section. Please note the syntax carefully.
The code first:
SELECT
l.ip, -- ips in the log_data
loc.city, -- ip metadata columns in the ip loc database
loc.region,
loc.country,
loc.lat,
loc.lng,
loc.postal,
loc.timezone
FROM
log_data AS l
JOIN
`ipinfo-XXXX.ipinfo_data.location` loc
ON
loc.join_key = NET.IP_TO_STRING(NET.IP_TRUNC(NET.IP_FROM_STRING(l.ip),
IF
(STRPOS(l.ip, ':') = 0, 16, 34)))
WHERE
NET.IP_FROM_STRING(l.ip)
BETWEEN
NET.IP_FROM_STRING(loc.start_ip)
AND
NET.IP_FROM_STRING(loc.end_ip)
Code breakdown:
- IP addresses from the log data are joined with IP metadata information from the IPinfo IP database.
- IP addresses in the
log_dataare converted from string data types to their IP address data type using BQ’s nativeNET.IP_FROM_STRING. - The
join_key(a truncated version of the IP address) facilitates one aspect of the join. This operation also considers IPv6 addresses. TheNET.IP_TRUNCcommand generates thejoin_keyfor the IP addresses in thelog_datatable. - The second aspect of the join operation is performed using the
BETWEENcommand. - Then, the IP metadata information is joined with the input IP addresses.
Query performance:
The log_data contained eight(8) thousand random IPv4 addresses and two(2) thousand random IPv6 addresses. The cache was cleared for this query.
If you need further clarifications, please drop a comment below. Thanks.



