Just got off a great call with @paul121 @gbathree @AmberS and others at Our Sci about strategies for keeping aggregated data up to date. I wanted to share some of the ideas and discussions that came up in that call with the wider community to elicit feedback and share ideas.
Our Sci is building a “Digital Coffee Shop”, which aggregates data from many farmOS instances together (via the farmOS Aggregator) to create data benchmarks and visualizations. They are periodically checking each instance for new records, and pulling them in to their own custom MongoDB cache, where they are formatted for use in the Coffee Shop.
One of the challenges of aggregating large amounts of data is keeping it updated as changes are made in all of the farmOS instances. And as this scales to more farms, the challenge scales with it.
Two farmOS feature ideas were proposed to help with this:
- farmOS could provide a dedicated API endpoint that lists entities that were added/updated/deleted recently, filterable by date. This would improve the efficiency with which external systems (like the Coffee Shop) are able to determine what they need to update on their end.
- farmOS could send webhook notifications to an endpoint in the external system to notify it immediately of changes.
(1) reminds me of some earlier discussions we had around maintaining an “audit log” of changes to records in farmOS. With farmOS v2, we now have “revision logs”, which maybe cover 80% of these needs, but they miss some things. For example, there is no way to see that records have been deleted.
We talked about different approaches to this. A simple approach might be to just query the log
and asset
(and other) tables to look at created
and changed
timestamps. Another would be to add a new dedicated database table that tracked everything in one place (high level audit trails - not specifics that are already captured in revisions). This could also capture deletions, and be query-able by Views in a single SQL query (and also exposed as an API endpoint via Views pretty easily).
(2) reminds me of recent work and discussions with @Farmer-Ed @paul121 around the “Notifications” module in farmOS. We recently took a first step towards generalizing that module out of the “Data Stream Notifications” module. Currently there is a single “notification type plugin” for “email notifications”. Perhaps “webhook notifications” could be another type.
@paul121 raised some very good questions about security and information disclosure considerations with webhooks, so we should think carefully about that. One potential approach would be to make the webhook notification config entities configurable, so you could specify what information is included in it. Some might just say “something updated”! Others might include UUID and entity type/bundle information. Still others might send the entire entity as JSON in the webhook request. @gbathree also made the point that farmOS users should be aware of everything that’s happening behind the scenes like this, so that there is visibility.
We also talked about some ideas around adding caching layers to the farmOS Aggregator itself, although I feel like that should be a separate forum topic.