1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261[](https://badge.fury.io/js/dynamodb-geo) [](https://circleci.com/gh/rh389/dynamodb-geo.js)
# Geo Library for Amazon DynamoDB
This project is an unofficial port of [awslabs/dynamodb-geo][dynamodb-geo], bringing creation and querying of geospatial data to Node JS developers using [Amazon DynamoDB][dynamodb].
## Features
* **Box Queries:** Return all of the items that fall within a pair of geo points that define a rectangle as projected onto a sphere.
* **Radius Queries:** Return all of the items that are within a given radius of a geo point.
* **Basic CRUD Operations:** Create, retrieve, update, and delete geospatial data items.
* **Customizable:** Access to raw request and result objects from the AWS SDK for javascript.
* **Fully Typed:** This port is written in typescript and declaration files are bundled into releases.
## Installation
Using [npm] or [yarn]:
`npm install --save dynamodb-geo` or `yarn add dynamodb-geo`.
## Getting started
First you'll need to import the AWS sdk and set up your DynamoDB connection:
```js
const AWS = require('aws-sdk');
const ddb = new AWS.DynamoDB({ endpoint: new AWS.Endpoint('http://localhost:8000') }); // Local development
```
Next you must create an instance of `GeoDataManagerConfiguration` for each geospatial table you wish to interact with. This is a container for various options (see API below), but you must always provide a `DynamoDB` instance and a table name.
```js
const ddbGeo = require('dynamodb-geo');
const config = new ddbGeo.GeoDataManagerConfiguration(ddb, 'MyGeoTable');
```
You may modify the config to change defaults.
```js
config.longitudeFirst = true; // Use spec-compliant GeoJSON, incompatible with awslabs/dynamodb-geo
```
Finally, you should instantiate a manager to query and write to the table using this config object.
```js
const myGeoTableManager = new ddbGeo.GeoDataManager(config);
```
## Choosing a `hashKeyLength` (optimising for performance and cost)
The `hashKeyLength` is the number of most significant digits (in base 10) of the 64-bit geo hash to use as the hash key. Larger numbers will allow small geographical areas to be spread across DynamoDB partitions, but at the cost of performance as more [queries][dynamodb-query] need to be executed for box/radius searches that span hash keys. See [these tests][hashkeylength-tests] for an idea of how query performance scales with `hashKeyLength` for different search radii.
If your data is sparse, a large number will mean more RCUs since more empty queries will be executed and each has a minimum cost. However if your data is dense and `hashKeyLength` too short, more RCUs will be needed to read a hash key and a higher proportion will be discarded by server-side filtering.
From the [AWS `Query` documentation][dynamodb-query]
> DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application. ... **The number will also be the same whether or not you use a `FilterExpression`**
Optimally, you should pick the largest `hashKeyLength` your usage scenario allows. The wider your typical radius/box queries, the smaller it will need to be.
Note that the [Java version][dynamodb-geo] uses a `hashKeyLength` of `6` by default. The same value will need to be used if you access the same data with both clients.
This is an important early choice, since changing your `hashKeyLength` will mean recreating your data.
## Creating a table
`GeoTableUtil` has a static method `getCreateTableRequest` for helping you prepare a [DynamoDB CreateTable request][createtable] request, given a `GeoDataManagerConfiguration`.
You can modify this request as desired before executing it using AWS's DynamoDB SDK.
Example:
```js
// Pick a hashKeyLength appropriate to your usage
config.hashKeyLength = 3;
// Use GeoTableUtil to help construct a CreateTableInput.
const createTableInput = ddbGeo.GeoTableUtil.getCreateTableRequest(config);
// Tweak the schema as desired
createTableInput.ProvisionedThroughput.ReadCapacityUnits = 2;
console.log('Creating table with schema:');
console.dir(createTableInput, { depth: null });
// Create the table
ddb.createTable(createTableInput).promise()
// Wait for it to become ready
.then(function () { return ddb.waitFor('tableExists', { TableName: config.tableName }).promise() })
.then(function () { console.log('Table created and ready!') });
```
## Adding data
```js
myGeoTableManager.putPoint({
RangeKeyValue: { S: '1234' }, // Use this to ensure uniqueness of the hash/range pairs.
GeoPoint: { // An object specifying latitutde and longitude as plain numbers. Used to build the geohash, the hashkey and geojson data
latitude: 51.51,
longitude: -0.13
},
PutItemInput: { // Passed through to the underlying DynamoDB.putItem request. TableName is filled in for you.
Item: { // The primary key, geohash and geojson data is filled in for you
country: { S: 'UK' }, // Specify attribute values using { type: value } objects, like the DynamoDB API.
capital: { S: 'London' }
},
// ... Anything else to pass through to `putItem`, eg ConditionExpression
}
}).promise()
.then(function() { console.log('Done!') });
```
See also [DynamoDB PutItem request][putitem]
## Updating a specific point
Note that you cannot update the hash key, range key, geohash or geoJson. If you want to change these, you'll need to recreate the record.
You must specify a `RangeKeyValue`, a `GeoPoint`, and an `UpdateItemInput` matching the [DynamoDB UpdateItem][updateitem] request (`TableName` and `Key` are filled in for you).
```js
myGeoTableManager.updatePoint({
RangeKeyValue: { S: '1234' },
GeoPoint: { // An object specifying latitutde and longitude as plain numbers.
latitude: 51.51,
longitude: -0.13
},
UpdateItemInput: { // TableName and Key are filled in for you
UpdateExpression: 'SET country = :newName',
ExpressionAttributeValues: {
':newName': { S: 'United Kingdom'}
}
}
}).promise()
.then(function() { console.log('Done!') });
```
## Deleting a specific point
You must specify a `RangeKeyValue` and a `GeoPoint`. Optionally, you can pass `DeleteItemInput` matching [DynamoDB DeleteItem][deleteitem] request (`TableName` and `Key` are filled in for you).
```js
myGeoTableManager.deletePoint({
RangeKeyValue: { S: '1234' },
GeoPoint: { // An object specifying latitutde and longitude as plain numbers.
latitude: 51.51,
longitude: -0.13
},
DeleteItemInput: { // Optional, any additional parameters to pass through.
// TableName and Key are filled in for you
// Example: Only delete if the point does not have a country name set
ConditionExpression: 'attribute_not_exists(country)'
}
}).promise()
.then(function() { console.log('Done!') });
```
## Rectangular queries
Query by rectangle by specifying a `MinPoint` and `MaxPoint`.
```js
// Querying a rectangle
myGeoTableManager.queryRectangle({
MinPoint: {
latitude: 52.225730,
longitude: 0.149593
},
MaxPoint: {
latitude: 52.889499,
longitude: 0.848383
}
})
// Print the results, an array of DynamoDB.AttributeMaps
.then(console.log);
```
## Radius queries
Query by radius by specifying a `CenterPoint` and `RadiusInMeter`.
```js
// Querying 100km from Cambridge, UK
myGeoTableManager.queryRadius({
RadiusInMeter: 100000,
CenterPoint: {
latitude: 52.225730,
longitude: 0.149593
}
})
// Print the results, an array of DynamoDB.AttributeMaps
.then(console.log);
```
## Batch operations
TODO: Docs (see [the example][example] for an example of a batch write)
## Configuration reference
These are public properties of a `GeoDataManagerConfiguration` instance. After creating the config object you may modify these properties.
#### consistentRead: boolean = false
Whether queries use the [`ConsistentRead`][readconsistency] option (for strongly consistent reads) or not (for eventually consistent reads, at half the cost).
This can also be overridden for individual queries as a query config option.
#### longitudeFirst: boolean = true
This library will automatically add GeoJSON-style position data to your stored items. The [GeoJSON standard][geojson] uses `[lon,lat]` ordering, but [awslabs/dynamodb-geo][dynamodb-geo] uses `[lat,lng]`.
This fork allows you to choose between [awslabs/dynamodb-geo][dynamodb-geo] compatibility and GeoJSON standard compliance.
* Use `false` (`[lat, lon]`) for compatibility with [awslabs/dynamodb-geo][dynamodb-geo]
* Use `true` (`[lon, lat]`) for GeoJSON standard compliance. (default)
Note that this value should match the state of your existing data - if you change it you must update your database manually, or you'll end up with ambiguously mixed data.
#### geoJsonPointType: "Point" | "POINT" = "Point"
The value of the `type` attribute in recorded GeoJSON points. Should normally be `"Point"`, which is standards compliant.
Use `"POINT"` for compatibility with [awslabs/dynamodb-geo][dynamodb-geo].
This setting is only relevant for writes. This library doesn't inspect or set this value when reading/querying.
#### geohashAttributeName: string = "geohash"
The name of the attribute storing the full 64-bit geohash. Its value is auto-generated based on item coordinates.
#### hashKeyAttributeName: string = "hashKey"
The name of the attribute storing the first `hashKeyLength` digits (default 2) of the geo hash, used as the hash (aka partition) part of a [hash/range primary key pair][hashrange]. Its value is auto-generated based on item coordinates.
#### hashKeyLength: number = 2
See [above][choosing-hashkeylength].
#### rangeKeyAttributeName: string = "rangeKey"
The name of the attribute storing the range key, used as the range (aka sort) part of a [hash/range key primary key pair][hashrange]. Its value must be specified by you (hash-range pairs must be unique).
#### geoJsonAttributeName: string = "geoJson"
The name of the attribute which will contain the longitude/latitude pair in a GeoJSON-style point (see also `longitudeFirst`).
#### geohashIndexName: string = "geohash-index"
The name of the index to be created against the geohash. Only used for creating new tables.
## Example
See the [example on Github][example]
## Limitations
### No composite key support
Currently, the library does not support composite keys. You may want to add tags such as restaurant, bar, and coffee shop, and search locations of a specific category; however, it is currently not possible. You need to create a table for each tag and store the items separately.
### Queries retrieve all paginated data
Although low level [DynamoDB Query][dynamodb-query] requests return paginated results, this library automatically pages through the entire result set. When querying a large area with many points, a lot of Read Capacity Units may be consumed.
### More Read Capacity Units
The library retrieves candidate Geo points from the cells that intersect the requested bounds. The library then post-processes the candidate data, filtering out the specific points that are outside the requested bounds. Therefore, the consumed Read Capacity Units will be higher than the final results dataset. Typically 8 queries are exectued per radius or box search.
### High memory consumption
Because all paginated `Query` results are loaded into memory and processed, it may consume substantial amounts of memory for large datasets.
### Dataset density limitation
The Geohash used in this library is roughly centimeter precision. Therefore, the library is not suitable if your dataset has much higher density.
[npm]: https://www.npmjs.com
[yarn]: https://yarnpkg.com
[updateitem]: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_UpdateItem.html
[deleteitem]: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_DeleteItem.html
[putitem]: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_PutItem.html
[createtable]: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_CreateTable.html
[hashrange]: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.CoreComponents.html#HowItWorks.CoreComponents.PrimaryKey
[readconsistency]: http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadConsistency.html
[geojson]: https://geojson.org/geojson-spec.html
[example]: https://github.com/rh389/dynamodb-geo.js/tree/master/example
[dynamodb-geo]: https://github.com/awslabs/dynamodb-geo
[dynamodb]: http://aws.amazon.com/dynamodb
[dynamodb-query]: http://docs.aws.amazon.com/amazondynamodb/latest/APIReference/API_Query.html
[hashkeylength-tests]: https://github.com/rh389/dynamodb-geo.js/blob/master/test/integration/hashKeyLength.ts
[choosing-hashkeylength]: #choosing-a-hashkeylength-optimising-for-performance-and-cost