The amount of publicly available geo data is rapidly increasing, and OpenStreetMap is no exception. It’s interesting & telling for the community to understand the current state of the dataset and the path it took in getting here. Fortunately, we have history dumps containing every revision to every object that has existed in OSM since the 2007 API 0.5 changeover.
By ingesting the entire OpenStreetMap history into Apache Spark, I will explore a statistical analysis of OpenStreetMap. For example, I will compare the rates of addition of various features (e.g. Which features were most rapidly added in the past year? Fastest overall?), discuss the most edited areas over time (What is the rate of building coverage in San Francisco? New York? Berlin?), how edits correlate to world events, and investigate major issues arising from editors (competing tags, edit wars).