IPinfo on Google Cloud Marketplace - Region challenge

Hi all, we’re attempting to use IPinfo in BigQuery (IPinfo’s data is now available in BigQuery) but running into the problem that the linked dataset it created is in the US region whilst our warehouse is in the EU region.

I can schedule a data transfer to move the data from one region to another but the challenge I’m facing is that I cannot copy/recreate the handy UDF. Without access to the source code to recreate it I guess I’ll need to use the below guidance to perform IP lookups?

Hey Johann,

Thank you for posting in the community. Are you using our free database?

Although I appreciate your interest in data transfer from US to EU region, that is not our intention for sharing our data in GCP. It should be universally available to all GCP customers across all region.

Let me first investigate this issue with our partnership and engineering team. If I can not make the GCP share available to you in a reasonable amount of time, I will DM you the UDTF code.

Thank you Abdullah, that’s very kind of you. And yes, using the free database

Hi,

I’m facing the same issue. The IPinfo dataset is by default created in US region and as such is not available to be used with EU data. Would be good if there would be an option to select dataset location when creating a dataset via Analytics Hub. As I guess copying the dataset to different location would not help much - as you would loose the updates. Or you would need to do this daily.

We ended up having to follow a fairly convoluted approach to get this data into our EU region:

Challenge

  • The linked dataset gets created in the US region. There is no way to choose a different region
  • Since we are in EU and cannot do cross region queries we need to get this data in EU
  • DTS does not support copying linked datasets so I cannot schedule a transfer to copy this data to EU

Workaround

  1. I created a “normal” US dataset (ipinfo_us)
  2. I create a table in this dataset from the linked dataset (ipinfo_linked) ie. create table as …
  3. This table can then be copied to our EU dataset (ipinfo) using a scheduled DTS transfer

Theres more…

  • Since this data can change I need to periodically recreate the table (step 2 above)
  • To do this I use our existing dbt project, create a macro containing the create table as … statement and call that from a dedicated dbt job
  • HOWEVER, since all our existing dbt environments use EU connections I needed to create a dedicated dbt environment (which I brilliantly name Maintenance (US Region)) with a US connection which is able to run that macro in the US region
1 Like

@Abdullah - Hi, any update on this? You have mentioned you will reach out to your team. Thanks

@Johann_De_Wet @Martin_Rode

Guys, my apologies for the delay. I have created the listing, but there were some permission-related issues on the UI. Let me check again. It is an identical database but shared in the EU region. Just hold on a bit, please.

2 Likes

@Abdullah - Thanks for that! This would be really step forward. But this would only work for EU.
Not sure how difficult it would be, but ideally the location of dataset could be selected here as a new dropdown. Then you wouldn’t have to manage two listings and would be available in all available dataset locations (we also use singapore and sydney locations).

Thanks for the ongoing support.

Thanks, Martin. I would appreciate your insights and thoughts here as we are in uncharted territory.

Analytics Hub, the platform through which GCP facilitates BigQuery-based data sharing, supports only EU and US-based data sharing. So, from our end, we can support the US and EU until GCP develops more data-sharing features and options.

Now, that does not mean we are out of options with your case. We have a number of options, and I would like to hear your thoughts.

Snowflake

Snowflake to us has been more robust when it comes to data sharing through their data marketplace. Although you guys are a GCP shop, I assume, but I just wanted to mention that with Snowflake, we did not have as much friction as you are seeing. Their data sharing across regions is much more robust, and we have incorporated our UDTFs there as well.

Here is the listing if you want to check it out: Snowflake

Data Push Mechanism

Data Push Mechanism is an offering where we push our database at regular intervals to a Cloud Bucket, which can be GCS, S3, Azure, etc. However, it is a paid service. Now, if your organization is already a customer, ask your IPinfo account manager if they can support the data push mechanism for the free dataset.

I think the data push mechanism has a bit of a perk in this particular case, as that would help you build internal APIs and consequently take advantage of BQ’s remote function feature.

Download and upload the data yourself

I do not recommend this, as we already have a few ideas in the works (I can’t tell you now), but if you need an immediate solution and have the engineering capacity dedicated to it, you can do it. The free database can be downloaded through a storage URI.

curl -L https://ipinfo.io/data/free/country_asn.csv.gz?token=$TOKEN -o country_asn.csv.gz

Then you have to unzip and ingest it to a GCS bucket and then bring that to BQ. Now, you have to maintain the system and be responsible for the orchestration. It is quite complicated, and we believe that customers should not be responsible for that. So, allow us a bit of time until we work on a better developer experience.

Thanks for all the options. Unfortunately, for us only GCP is the option (or direct download). On your screenshot with Exchange Configuration, there are two types of locations actually.

Region and Multi-Region.

So far only US and EU are offered as multi-region. For other locations like Singapore or Sydney, you would need to switch to Region location type and then select the proper region. (more on this here BigQuery locations  |  Google Cloud). That would mean to have one listing per region - which is not impossible, but not ideal.

Important for us now is the EU, so once available this should do the trick. Long term would be ideal that the location type could be selected by subscriber, which is something GCP team would need to implement.

Thanks again.

Thank you very much! I did not realize that. I appreciate your guidance on this.

msedge_m01x0Y5hyw

So, I can see the regions you requested are available here:

  • australia-southeast1 (Sydney)
  • asia-southeast1 (Singapore)

I am not sure if single region distribution is something we can support for the free database at this moment, but I will see what our teammates says. If it were just ‘APAC’ as a multi-region, I think they would have readily approved it.