Adding API endpoints for averages, standard deviations, etc

gbathree · July 21, 2025, 4:40pm

Hi, I am working with @miriam to build out a benchmarking dashboard in farmOS. Here is a very mocked up version with notes, but basically it’s (on the left) a benchmarking table (comparing your soil test values to average values in the db) and on the right there’s a table showing your soil test scores over time by field.

We’re considering creating a kind of ‘empty’ dashboard component that we can put JS into, as JS has many easy to use graphing libraries. However, for the left graph, Drupal would need to create, store and provide (ideally via the API to be most available and most generalizable) average values for each soil test value. For example, let’s talk about soil carbon:

A very simple version of a single lab test log with a soil carbon quantity (not in farmOS structure, but hopefully you get the idea here).

log-lab_test:
  quantity:
    soil_carbon: 5

Maybe there are 20 of these in the database. So we need to generate an average value across all soil_carbon quantities connected to a lab_test log.

Question: how do we store this resulting average value, and where is the best place to make it available in the api? I don’t see an existing design pattern for this kind of information. There’s a few options:

Add it to every lab test log as part of attributes (like attributes.calculation[0].soil_carbon // .value or something)`, a… I don’t like this as would be really calculation heavy, but it’s convenient because you don’t have to make a special query to get it. It also could be flexible to many calculations (standard deviation, etc.), and work across all entity types.
Create a completely custom API endpoint as needed
Create a new top-level entity for data of this type (sorry I may be saying this wrong… but read on hopefully it makes sense)… for example, imagine if in the api there was also a log--lab_test_calculations or something, which then contained this and any other calculations (for example, standard deviation) that you may want to apply to lab test logs and their associated quantities. This is nice because it generalizes to any other entities (assets, logs, etc.) and any other calculations you’d want to apply to a log. It would not handle calculations across different entity types.

Maybe these are all bad options, not sure. Would love any ideas or feedback on this.

mstenta · July 21, 2025, 5:05pm

Based on the requirements described (a dashboard report), I think a custom API endpoint is perfectly fine for that!

This could be implemented simply as a Controller (aka a Drupal route) like we do for the “basic” data stream API endpoints (for posting and querying sensor data), which are essentially just custom API endpoints in the same way - they are not part of JSON:API. It’s super simple, we just declare the route path (/api/data_stream/{uuid}/data):

github.com/farmOS/farmOS

modules/core/data_stream/data_stream.routing.yml

2b0b40f8b


      
          data_stream.data:
            path: '/api/data_stream/{uuid}/data'
            defaults:
              _controller: '\Drupal\data_stream\Controller\DataStreamController::dataStream'
            requirements:
              # There is no access restriction to this endpoint.
              _access: 'TRUE'

… and the controller class that builds it (\Drupal\data_stream\Controller\DataStreamController::dataStream):

github.com/farmOS/farmOS

modules/core/data_stream/data_stream.routing.yml

2b0b40f8b


      
          data_stream.data:
            path: '/api/data_stream/{uuid}/data'
            defaults:
              _controller: '\Drupal\data_stream\Controller\DataStreamController::dataStream'
            requirements:
              # There is no access restriction to this endpoint.
              _access: 'TRUE'

Here is the controller class (specifically the dataStream() method referenced above):

github.com/farmOS/farmOS

modules/core/data_stream/src/Controller/DataStreamController.php

2b0b40f8b


      
           * @param string $uuid
           *   The DataStream UUID.
           *
           * @return \Symfony\Component\HttpFoundation\Response
           *   The response.
           *
           * @throws \Drupal\Component\Plugin\Exception\InvalidPluginDefinitionException
           * @throws \Drupal\Component\Plugin\Exception\PluginException
           * @throws \Drupal\Component\Plugin\Exception\PluginNotFoundException
           */
          public function dataStream(Request $request, string $uuid) {
          
            // Load the data stream.
            $data_streams = $this->entityTypeManager()->getStorage('data_stream')->loadByProperties([
              'uuid' => $uuid,
            ]);
          
            // Bail if UUID is not found.
            if (empty($data_streams)) {
              throw new NotFoundHttpException();
            }

We have a layer of abstraction there that I will gloss over (because we support different “types” of data streams defined as “plugins”) but ultimately the controller just calls the basic data stream plugin’s “apiHandleRequest()” method.

github.com/farmOS/farmOS

modules/core/data_stream/src/Controller/DataStreamController.php

4.x


      
          
              // Get the request method.
              $method = $request->getMethod();
          
              // Bail if the plugin does not allow the method.
              $allowed_methods = $plugin->apiAllowedMethods();
              if (!in_array($method, $allowed_methods)) {
                throw new MethodNotAllowedHttpException($allowed_methods);
              }
          
              return $plugin->apiHandleRequest($data_stream, $request);
            }
          
          }

Here is that method:

github.com/farmOS/farmOS

modules/core/data_stream/src/Plugin/DataStream/DataStreamType/Basic.php

2b0b40f8b


      
          /**
           * {@inheritdoc}
           */
          public function apiAllowedMethods() {
            return [Request::METHOD_GET, Request::METHOD_POST];
          }
          
          /**
           * {@inheritdoc}
           */
          public function apiHandleRequest(DataStreamInterface $stream, Request $request) {
          
            // Get request method.
            $method = $request->getMethod();
          
            // Handle GET request.
            if ($method == Request::METHOD_GET) {
          
              // Bail if the sensor is not public and no private_key is provided.
              if (!$stream->isPublic() && !$this->requestHasValidPrivateKey($stream, $request)) {
                throw new AccessDeniedHttpException();

… which, for a GET request basically just loads all the data and returns it as a JSON object:

github.com/farmOS/farmOS

modules/core/data_stream/src/Plugin/DataStream/DataStreamType/Basic.php

2b0b40f8b


      
              // Bail if more than the max is requested.
              // Only allow 100k max data points to prevent exhausting PHP's memory,
              // which is a potential DDoS vector.
              if ($limit > $max_limit) {
                throw new UnprocessableHttpEntityException();
              }
            }
            $params['limit'] = $limit;
          
            $data = $this->storageGet($stream, $params);
            return new JsonResponse($data);
          }
          
          /**
           * Handle API POST requests.
           *
           * @param \Drupal\data_stream\Entity\DataStreamInterface $stream
           *   The data stream.
           * @param \Symfony\Component\HttpFoundation\Request $request
           *   The request.
           *

So, long story short… you can very easily make a route that simply returns whatever JSON you want.

The cool thing about all of this is: controller responses can be cached, which is a form of automatic “storage”!

There are lots of options in Drupal for controlling how this caching works. You just need to figure out what kinds of events would invalidate it. So eg: if a new soil test log is added, it could invalidate all the API endpoints so they are recalculated on-demand when they are requested again. Or you can get more granular to only invalidate certain requests, if you want to improve performance.

You would need all the same logic anyway if you “stored” the data directly in the database. Drupal’s caching layer is already built for that exact purpose, though, so IMO there’s no reason to do something custom.

This API endpoint can then be easily queried by the JavaScript in your dashboard, and it will be lightning fast once the response has been generated the first time and cached. If it takes a little bit of time to build, put a little spinner in your dashboard UI to indicate that it’s loading. If you don’t want that, then write a cron job that automatically requests the endpoint to “warm up” the cache before a real user does.

Should be pretty easy!

miriam · July 22, 2025, 10:37am

Thanks @mstenta for such a detailed explanation!

gbathree · July 22, 2025, 5:13pm

Thanks @mstenta ! So I guess in the end /api/data_stream/{uuid}/data is the endpoint, and we could kind of create some standards in there so this is repeatable (or not?).

For example, there are many cases for benchmarking where you want to compare average or standard deviation of a quantity (like label: soil_carbon) contained in a log type (like log-lab_test). But we may want to do this for a tillage log and tillage depth, or harvest log and harvest amount. So we can create whatever types of logical structures we want in there to reasonably accommodate a repeat of this pattern… like…

data_stream.{uuid}.data.log-lab_test.soil_carbon <- show average in lab tests for soil carbon
data_stream.{uuid}.data.log-harvest.yield <- show average in harvest for yield
// later option
data_stream.{uuid}.data.log-lab_test.soil_carbon <- show average in lab tests for soil carbon

mstenta · July 22, 2025, 5:39pm

/api/data_stream/{uuid}/data is the endpoint for showing all data that is associated with that data stream. In that case, we have an entity (the data stream) which already has its own JSON:API endpoint (/api/data_stream/{uuid}), so it makes sense to just add a /data path/route to the end of that for serving up the data (this is also what sensors can push data to).

In your case, you’re not dealing with a single entity, but rather a computed value derived from many entities (quantities). So it’s a bit different, and personally I don’t think that should be tacked onto individual entity API endpoints. Since there isn’t any existing API endpoints for this kind of data (yet?) it can really be wherever you want. Standard convention would be to prefix it with /api/... but even that is optional. As long as it doesn’t conflict with current JSON:API endpoints (which are generally /api/[entity-type]/[bundle]/...) I think it’s OK to experiment, and then maybe as more and more of these things are implemented we’ll all converge on a best practice convention for these kinds of “custom” endpoints for serving derived/computed data that is not directly associated with a single entity.

One idea might be to think about these as “reports”, and consider centralizing them around some standard /api/reports/* endpoints. We created a placeholder module called farm_report which simply adds a “Reports” menu item that shows an index of all reports underneath it - but no modules currently make use of this. It was forward-thinking.

Perhaps the information you are synthesizing could be displayed in a /reports/soil-health-benchmarking path in the UI, and the dashboard pane you described could be a simplified version of this that links to the “full report” at that path. And then maybe it would make sense to follow a similar pattern for your custom API endpoint: /api/reports/soil-health-benchmarking (or whatever you call it).

gbathree · July 22, 2025, 5:56pm

Ok… Yeah I love that actually