Twofishes from Foursquare and Pelias from Mapzen are open source geocoders powered by open data. In this talk, we share our perspective on the challenges in building open geocoders that are competitive with those from commercial providers in terms of precision, recall, robustness and timeliness.
From Wikipedia to OpenStreetMap, global disparities in coverage and quality are obvious concerns with any community effort. We contend there are additional difficulties inherent to core geocoding data accurately modeling administrative hierarchies, drawing boundaries and tying addresses to streets. Further, we argue that compared to rendering or routing data, there are fewer incentives for mappers to overcome these hurdles and improve geocoding data.
OpenStreetMap pragmatically favors ease of tagging for the mapper over data cohesion for software. While this philosophy is one of the reasons for OSM’s success, it also complicates geocoding. Specifically, the preponderance of unstructured tags over equivalent structured tags and the tendency to prefer string valued tags on nodes and ways to id valued relations put us at a disadvantage compared to our commercial counterparts.
We discuss how we work around these problems at present and explore novel ways in which we might further enrich OSM data for geocoding.