Database for local language / foreign language / altname translation for geographic regions

Hey,

I have been working on this repo: abdullahdevrel/geoname_altname: Parsing and processing foreign language or alternative location names from Geonames.org. (github.com)

The repository contains a database and instructions for obtaining foreign/local translations of geographic regions in a single database. The database accepts the geoname_id from geonames.org as the key.

Why does IPinfo not provide region name translations/altnames in the IP location data?

Our IP geolocation dataset has become so granular, that there is no location dataset out there that has full translation coverage in different languages. As our foremost priority is providing accurate, predictable, and structured data, we cannot include the alternate names or language translations of geographic regions in our dataset as this will include “gaps” in the database.

If we choose a fixed number of translations to be included in our location data, we must ensure that we include those translations for all location points, no matter how sparse they may be or how many IPs are located there. This is not possible without making a significant compromise in accuracy by moving the IP located region to the nearest major geographic point.

Another issue is size. Our geolocation data is extremely granular and, consequently, quite large in size. If we include regional name translations, the size of our location database will increase substantially based on the number of translations we choose to include.

Solution for providing region name translations

We chose to provide our users with the authority and flexibility to bring their own database of translated location names that complement our location database. Geonames is a crowd-sourced dataset. It is fantastic because it is free and open-source. They have a bunch of datasets that you can experiment with.

You can also check the included Jupyter notebook and tweak the settings as you please.

Caveats

  • Currently, the geoname_altname database includes the geoname_id, which is only available in the “standard” variant of the IP geolocation database. I will look into providing universal support to all of our databases that include any form of geographic identifier.
  • I have not yet set up regular data dumps on the repository. It is not an immediate priority as the geonames database is relatively stable and does not require many updates.

Feedback Needed, please.

I would greatly appreciate some feedback on the repo, prioritizing the idea of “stability.” If you can share feedback, instructions, and suggestions about the direction of the repo and the produced database, I would be very grateful. Thank you very much.