Adding API endpoints for averages, standard deviations, etc

Hi, I am working with @miriam to build out a benchmarking dashboard in farmOS. Here is a very mocked up version with notes, but basically it’s (on the left) a benchmarking table (comparing your soil test values to average values in the db) and on the right there’s a table showing your soil test scores over time by field.

We’re considering creating a kind of ‘empty’ dashboard component that we can put JS into, as JS has many easy to use graphing libraries. However, for the left graph, Drupal would need to create, store and provide (ideally via the API to be most available and most generalizable) average values for each soil test value. For example, let’s talk about soil carbon:

A very simple version of a single lab test log with a soil carbon quantity (not in farmOS structure, but hopefully you get the idea here).

log-lab_test:
  quantity:
    soil_carbon: 5

Maybe there are 20 of these in the database. So we need to generate an average value across all soil_carbon quantities connected to a lab_test log.

Question: how do we store this resulting average value, and where is the best place to make it available in the api? I don’t see an existing design pattern for this kind of information. There’s a few options:

  1. Add it to every lab test log as part of attributes (like attributes.calculation[0].soil_carbon // .value or something)`, a… I don’t like this as would be really calculation heavy, but it’s convenient because you don’t have to make a special query to get it. It also could be flexible to many calculations (standard deviation, etc.), and work across all entity types.
  2. Create a completely custom API endpoint as needed
  3. Create a new top-level entity for data of this type (sorry I may be saying this wrong… but read on hopefully it makes sense)… for example, imagine if in the api there was also a log--lab_test_calculations or something, which then contained this and any other calculations (for example, standard deviation) that you may want to apply to lab test logs and their associated quantities. This is nice because it generalizes to any other entities (assets, logs, etc.) and any other calculations you’d want to apply to a log. It would not handle calculations across different entity types.

Maybe these are all bad options, not sure. Would love any ideas or feedback on this.

1 Like

Based on the requirements described (a dashboard report), I think a custom API endpoint is perfectly fine for that!

This could be implemented simply as a Controller (aka a Drupal route) like we do for the “basic” data stream API endpoints (for posting and querying sensor data), which are essentially just custom API endpoints in the same way - they are not part of JSON:API. It’s super simple, we just declare the route path (/api/data_stream/{uuid}/data):

… and the controller class that builds it (\Drupal\data_stream\Controller\DataStreamController::dataStream):

Here is the controller class (specifically the dataStream() method referenced above):

We have a layer of abstraction there that I will gloss over (because we support different “types” of data streams defined as “plugins”) but ultimately the controller just calls the basic data stream plugin’s “apiHandleRequest()” method.

Here is that method:

… which, for a GET request basically just loads all the data and returns it as a JSON object:

So, long story short… you can very easily make a route that simply returns whatever JSON you want. :slight_smile:

The cool thing about all of this is: controller responses can be cached, which is a form of automatic “storage”! :slight_smile:

There are lots of options in Drupal for controlling how this caching works. You just need to figure out what kinds of events would invalidate it. So eg: if a new soil test log is added, it could invalidate all the API endpoints so they are recalculated on-demand when they are requested again. Or you can get more granular to only invalidate certain requests, if you want to improve performance.

You would need all the same logic anyway if you “stored” the data directly in the database. Drupal’s caching layer is already built for that exact purpose, though, so IMO there’s no reason to do something custom.

This API endpoint can then be easily queried by the JavaScript in your dashboard, and it will be lightning fast once the response has been generated the first time and cached. If it takes a little bit of time to build, put a little spinner in your dashboard UI to indicate that it’s loading. If you don’t want that, then write a cron job that automatically requests the endpoint to “warm up” the cache before a real user does.

Should be pretty easy!

2 Likes

Thanks @mstenta for such a detailed explanation!

3 Likes

Thanks @mstenta ! So I guess in the end /api/data_stream/{uuid}/data is the endpoint, and we could kind of create some standards in there so this is repeatable (or not?).

For example, there are many cases for benchmarking where you want to compare average or standard deviation of a quantity (like label: soil_carbon) contained in a log type (like log-lab_test). But we may want to do this for a tillage log and tillage depth, or harvest log and harvest amount. So we can create whatever types of logical structures we want in there to reasonably accommodate a repeat of this pattern… like…

data_stream.{uuid}.data.log-lab_test.soil_carbon <- show average in lab tests for soil carbon
data_stream.{uuid}.data.log-harvest.yield <- show average in harvest for yield
// later option
data_stream.{uuid}.data.log-lab_test.soil_carbon <- show average in lab tests for soil carbon

/api/data_stream/{uuid}/data is the endpoint for showing all data that is associated with that data stream. In that case, we have an entity (the data stream) which already has its own JSON:API endpoint (/api/data_stream/{uuid}), so it makes sense to just add a /data path/route to the end of that for serving up the data (this is also what sensors can push data to).

In your case, you’re not dealing with a single entity, but rather a computed value derived from many entities (quantities). So it’s a bit different, and personally I don’t think that should be tacked onto individual entity API endpoints. Since there isn’t any existing API endpoints for this kind of data (yet?) it can really be wherever you want. Standard convention would be to prefix it with /api/... but even that is optional. As long as it doesn’t conflict with current JSON:API endpoints (which are generally /api/[entity-type]/[bundle]/...) I think it’s OK to experiment, and then maybe as more and more of these things are implemented we’ll all converge on a best practice convention for these kinds of “custom” endpoints for serving derived/computed data that is not directly associated with a single entity.

One idea might be to think about these as “reports”, and consider centralizing them around some standard /api/reports/* endpoints. We created a placeholder module called farm_report which simply adds a “Reports” menu item that shows an index of all reports underneath it - but no modules currently make use of this. It was forward-thinking.

Perhaps the information you are synthesizing could be displayed in a /reports/soil-health-benchmarking path in the UI, and the dashboard pane you described could be a simplified version of this that links to the “full report” at that path. And then maybe it would make sense to follow a similar pattern for your custom API endpoint: /api/reports/soil-health-benchmarking (or whatever you call it).

1 Like

Ok… Yeah I love that actually

1 Like