| 2-Oct-2018 | Like this? Dislike this? Let me know |
I thought I would explore the latest hurricane data available from NOAA. This is a comprehensive rant that covers a number of things:
As with most public datasets, HURDAT2 is designed for "maximum ingestibility" and is a fixed width but also comma-separated file, Below is a quick sample:
AL122005, KATRINA, 34, 20050823, 1800, , TD, 23.1N, 75.1W, 30, 1008, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20050824, 0600, , TD, 23.8N, 76.2W, 30, 1007, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20050824, 1200, , TS, 24.5N, 76.5W, 35, 1006, 60, 60, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 20050825, 0600, , TS, 26.1N, 78.4W, 50, 997, 60, 60, 0, 0, 15, 0, 0, 0, 0, 0, 0, 0, 20050825, 1800, , TS, 26.2N, 79.6W, 60, 988, 70, 70, 50, 60, 25, 25, 20, 20, 0, 0, 0, 0, 20050826, 0000, , HU, 25.9N, 80.3W, 70, 983, 70, 70, 50, 40, 20, 20, 20, 20, 10, 10, 10, 10, ...
This data can be easily slurped into practically any database and even some geo visualizers -- but can we do even more? Certainly!
|
|
|
|
|
----------C----------
|
|
|
|
|
C has winds up to 34 knots
|
|
|
|
+---3
--------| C |--------
+---+
|
|
|
|
C has winds >=34
|
|
|
+-----3
| |
-------| C |--------
| |
+-----+
|
|
|
C now is at, say 45. The 34 knot ring gets bigger.
|
|
+---------3
| |
| +--5 |
-----| |C | |--------
| +--+ |
| |
+---------+
|
|
C is 50 so we create a new inner ring for 50 knot
We're not going to do that here.
Instead, we will take a very simple approach. Consider this wind radii diagram and values:
|
a...
| ..
. f NE
NW | .
. | .
-------e----+----c--b-------
| .
| SE
d.
|
|
NE = 11, SE = 6, SW = 0, NW = 8
A safer but less interesting option
is to create rings with no holes, layer them, and use software like
GeoTools
to perform the diff/union operations.
In either event, it is important to recognize that if you use
$geoIntersects to find windRings that touch a given point, you
must still do a bit of work to determine exactly which ring is
involved. This is essentially identical to matching material in an array
of ordinary scalars in MongoDB: if any element matches, the whole doc is
returned and you do not immediately know which elements in the array exactly
match. The standard approach is the double match pattern:
/*
Approach with MultiPolygon
So in unwinding the coordinates array of the MultiPolygon, we "break"
the MultiPolygon definition because we now have single Polygons; that is,
there is one less level of nested arrays. MongoDB will
silently ignore the improperly constructed MultiPolygon and nothing will match.
So... we have to be clever and change the name of the type to Polygon!
*/
var pt = [ -72.5, 40.5 ]
var theMatch = {$match: {loc: {$geoIntersects:
{$geometry: {type: "Point", coordinates: pt }}}
}};
db[collname].aggregate([
theMatch // first pass, index optimized
,{$unwind: "$loc.coordinates"}
,{$addFields: {"loc.type":"Polygon"}} // ah HA! Overwrite the type field!
,theMatch // second pass
]);
"windRings" : {
"type" : "GeometryCollection",
"geometries" : [
{
"type" : "Point",
"coordinates" : [
-76.8,
16.7
]
},
{
"type" : "MultiPolygon",
"coordinates" : [
[
...
It's good to have the flexibility of putting the center point and a MultiPolygon in one indexable field (an array with 2 elements). Should our needs change esp. around nesting and holes, we could easily switch this to a GeometryCollection with a Point and up to three discrete Polygon for a total of 4 elements in the array. All will be indexed.
The web is filled with commentary on the benefits and differences of FeatureCollection vs. GeometryCollection but I believe it comes down to context, history, and tool compatibility.
Before there was GeoJSON and MongoDB, there was esri and ARCGis and shapefiles. A standard unto itself, almost all academic and government geodata since the 1980s (yep) have used these standards. But owing to their age, shapefiles and the data environment in ARCGis are not rich data / flexible data friendly. Cross vendor data integration was also in a very different state 30 years ago. So shapefiles had to carry both geo data and other data like properties for visualization and just simple management purposes. This created a paradigm where the geo file and data structures were the boss and any other auxilliary data had to be tucked into a standard slot in those structures.
FeatureCollection very much parallels this design for the purposes of
some amount of compatibility. It is a single object that contains an array
of Feature objects.
Each Feature MUST contain these fields:
{"type": "FeatureCollection",
"features": [
{"type": "Feature",
"geometry": {"type": "Point", "coordinates": [-94.8, 28.0]},
"properties": {} }
]
}
With modern programming languages, the MongoDB document model, and a fully GeoJSON-oriented solution, you no longer have to "trap" auxilliary data with the geo data. In the windRings example above, note how no properties are required. We are free to design any structure we want -- including, if we so desire, to add an additional field called properties that may help described the contents of the geodata:
"windRingModel": {
"version": 3,
"params": [
{"name":"R34", "speed": {v:34, m:"knots"}, bias: [ 0.4, 0.23, 0.12] },
{"name":"R50", "speed": {v:50, m:"knots"}, bias: [ 0.8, 0.40, 0.22] }
]
"windRings" : {
"type" : "GeometryCollection",
"geometries" : [
{
"type" : "Point",
"coordinates" : [
-76.8,
16.7
]
},
{
"type" : "MultiPolygon",
"coordinates" : [
[
...
It is likely very true that one might create a FeatureCollection for the purposes of driving a visualization tool outside of software and/or MongoDB GeoJSON manipulation space, such as geojson.io. In this case, depending on simplicity, clarity, security of data, and other factors, there might be multiple sets of properties you might want to expose with the geo data. This suggests an approach where a program starts with the GeometryCollection and other data in the document and assembles features for the FeatureCollection, with the primary task being setting up the required properties field as appropriate for the target consumer and/or rendering technology. Here is an example:
# Get a doc from MongoDB:
item = coll.find(params)
fwrap = {}
fwrap['type'] = "FeatureCollection"
fcoll = []
# This is the modest amount of bespoke code. Basically, you are just
# iterating through the GeometryCollection and assembling the aux data.
# Note that the GeoJSON shape in GeometryCollection is lifted straight
# into the the 'geometry' field of the Feature object
n = len(item['windRings']['geometries'])
for i in range(0,n):
onef = {"type":"Feature"}
onef['geometry'] = item['windRings']['geometries'][i] # Easy! set the whole shape at once!
wrm = item['windRingsModel']['params']
onef['properties'] = {"speed":wrm[i]['speed'], "status":wrm[i]['status']
fcoll.append(onef)
fwrap['features'] = fcoll
print json.dumps(fwrap)
# The output is copy-and-pasteable in geojson.io
"Unwrap" the features array and emit each shape as CR-delimited JSON.
Important to use the -c option here to make it CR-delimited:
$ jq -c '.features[] | del(.type)' featurecollection.json > geojson.json
Now simply import! Use of skipRow is not required but helpful when a very small
number of shapes have bad geometries and stop the whole load.
$ mongoimport --uri connectionString -d test -c myCollection --parseGrace skipRow geojson.json
Like this? Dislike this? Let me know