Hi all esp. @mstenta, have been working through the api switchboard and plant data service ontology work recently, and thinking a lot about how we can use the aggregator to better visualize results back to farmers and others interested in reviewing aggregated farm data.
In talking to Richard from Rothamstead, we are both doing similar work which is needed to move forward - we are creating a standard ‘interpretation’ layer on top of the core farmOS schema. What does that mean? It means:
Fully explaining how and why we use each variable… like log_category… what exactly is the end use of that variable? How do we represent ‘intention’ (tillage for weed control) versus ‘action’ (tillage is a machine operation), what is a ‘material’ versus a ‘quantity’? What is the intent behind the ‘quantity’ ‘name’ field… how will it get used?
Hard define certain lists so that comparing data across farms is more likely to be successful. So I would put log_category, units (in quantity fields) and a few others in this category.
Creating a framework for what this ‘interpretation’ layer is and how it’s implemented.
On item (3)… here’s what I mean:
Who is this data going to be used by (ie who’s interpreting it?).
Modelers (they care about actions, minimal but very specific details)
Farmers looking at their own farm data (they also care about actions, need all details)
Farmers comparing data from their farm to others (they are interested in intent, need to summarize things for effective comparison)
Certifiers looking at many farms (they are interested in actions, but a limited set, and may also need the ability to summarize differently than others)
Purchasers or marketers comparing many farms (they are interested in highly summarized data, and the organization of logs / assets may need to be different).
How will the farmOS data be merged or organized by the person using it
modeler, API to Python for example
dashboard, API from aggregator to JS for example
If we can agree on a process by which we can evaluate this ‘interpretation’ layer, then those already doing it (RFC, Rothamstead) could begin to create a common interpreation layer and maybe even implement it as a … farmOS module (mike is that how we’d do it???).
I could be totally off here… any thoughts ideas would be great.
I think having a well-specified data schema separate from the actual farmOS implementation of that schema could be very trasformative to the software ecosystem around farmOS.
Some of the uses I can imagine;
Representing complex test data to be loaded into farmOS for integration-level testing and/or demo/tutorial purposes
Drupal-version-agnostic backup/migration tooling - e.g. the farmOS 2.x migration could use a well-defined schema as part of the migration path
These might be obvious or captured elsewhere already, but here’s a stab at a few axioms for defining such a schema;
Three categories of data types model all data in the schema; data-field types, top-level data types, and views.
Three top-level data types exist in the schema; assets, areas, and logs.
Top-level data types have two immutable attributes; type and primary-id.
Assets represent physical things or logical collections of things.
Areas represent logical - possibly geospatial - locations where assets can be or logs can take place.
Logs hold data associated with a point in time - and possibly associated with one or more assets or areas.
Logs are mutable, but should rarely be changed after creation except where the data therein was in error at the time it was created. Generally, new information should just go in a new log to supersede the old one.
Data-fields apply to one or more types of assets, areas, or logs.
Data-fields of assets and areas are modeled via logs which reference that asset thereby providing both the current value and a history of the values of those data fields.
The manner in which the logs for an asset or area are combined to determine its current data-fields is defined on a field-by-field basis - i.e. some fields might just take the most recent log value for that field whereas others might be the sum of the log values for that field.
Data-field types are defined in a manner which is format/language agnostic while being conducive to easy implementations in common formats/languages - e.g. ISO8601 dates as UTF8 strings or numbers conforming to RFC 7159 but implemented as either a numeric or string type depending on format support.
Views represent the current state of a asset, area, or log including all its data-fields.
A log is effectively a view of itself.
The view of an asset or area is the current value for each data-field from its logs.
The format of the core schema specification encourages extension via new types of assets, areas, or logs and new data-field types.
These extensions to the core schema are specified in the same format as the core schema and implementations are encouraged to support use of the core schema plus any number of extensions.
This is much lower level than I was thinking - and therefore probably much better and more thoughtful
@Symbioquine what do you think about the idea of starting from the ‘end user’… the person using the data… to drive the interpretation layer. Like… a farmer comparing to other farmers needs to have consistent log_categories so that farms activities can be easily compared, making that comparison experience easy, simple and faster. How does starting from the ‘end user’ integrate with the ideas you laid out?
I’m a big fan of working backwards in general. @gbathree the problem absolutely needs to be attacked from both - and maybe even additional - angles.
I haven’t dug too deeply into what additional requirements and/or end-user experiences are suggested by the resources you linked, but I think that farmOS itself has evolved in a fairly agile manner to satisfy a broad agricultural record-keeping niche.
It is very easy for this kind of project to become a huge pie-in-the-sky sort of thing, but it might not need to be. farmOS embodies a lot of lessons in how to model this sort of data. Further a major logical portion of your requirements are likely to come from easy interoperability - including with farmOS. Thus - at least as a mental exercise - I think it’s valuable to define the axioms of a schema derived from how farmOS models data and see how well the requirements/user-stories match up with it.
I think that is a good guiding principle, but is too broad to be a very useful user story.
The user stories need to be specific to particular measurement/comparison tasks otherwise you risk implicitly encoding that the schema can never be extended or changed over time. (Or making it too generic to provide much value.) For example, if two farms are comparing their harvest yields for legumes and this implies a ‘harvest’ log category, it probably wouldn’t be relevant if one of the farms also had some additional forestry-related log categories.
Thus we can refine the above guiding principle somewhat; “A farmer performing a comparison between farms needs all those farms to be actively using the log categories which affect the comparison in question to ensure the data can be easily/accurately compared.” (I’d leave “simple” and “faster” out of this particular one since presumably other guiding principles would speak to the general simplicity and speed/efficiency of the system and it’s otherwise just feel-good fluff.)
So it’s less about two arbitrary farms, but several groups of 50 - 100 farmers inputting data into farmOS instances through farmOS or another service (like SurveyStack). In this case, if we add a interpretation layer (like everyone use these log_categories) that means the groups are inter and intra comparable by default… given the numbers, we are better off simply deciding what the categories we agree on ahead of time, rather than trying to ‘line them up’ on a one off basis. Also, we can control that input experience to ensure comparability from the get-go, so we probably should.
It sounds like we might be talking about solving very different problems.
As I described initially, I think there’s a strong case for a interchange schema which has near-parity with the data model of farmOS.
However, presumably the more condensed/distilled you can make the data that is extracted from farmOS (or other systems) to feed the “Digital Coffee Shop” the cheaper the standard will be to define and the easier it will be to prove that the data is sufficiently anonymized. To that end you wouldn’t need/want parity with the richness of data that can be modeled in farmOS.
Similarly, for the aforementioned example it would totally make sense to limit the log categories supported by the aggregator/interchange-layer. That said, I think it might be counter-productive to too prescriptive about the ways farmers can/should use their own farmOS instances. It’s hard to understate the importance of the “farmer-controlled” qualifier from your own definition of farmOS;
FarmOS is a well developed, privacy-first, farmer-controlled farm management system. Farmers can then choose to combine their data together through the Aggregator tool, which then feeds the Digital Coffee Shop.
You can probably achieve your goal of interoperability by providing easy tooling for entering data. Then for those farmers using farmOS directly, it might make sense to provide some sort of tooling which can show whether a area/asset/log is in a compatible format and preview how the data would get aggregated. This has the advantage that it allows for incremental migration of existing data and permits the farmOS instances to be used for use-cases which are a superset of those supported by the aggregator/coffee-shop.
Thanks for getting this conversation started @gbathree - and thanks for all the great thoughts @Symbioquine!
I spoke with @gbathree on the phone a bit about these ideas yesterday. I think a lot of what we talked about is covered above, but I’ll summarize some of the thoughts…
I’m excited to see the community move towards a more formal process for these kinds of things! Since the beginning I’ve felt that there are two sides to designing the farmOS data model. On the one hand, as developers, we are coding a flexible data architecture that can be used to represent the full breadth of possibilities in farm production systems. This requires limiting how “opinionated” the architecture is, so that we don’t create barriers. On the other hand, as a community of users, we are honing in on “conventions” on top of the data architecture, which standardize HOW the available data structures are used in practice.
I see this as a necessary process to work through together, and I believe that these conventions will naturally bubble up over time as more people use farmOS and share what they are doing. And especially as we start to build some of the bigger cross-instance features (eg: sharing and aggregating data). In order for data to be “comparable”, it needs to be following the same conventions.
We’ve been taking small steps already, with quick forms and surveys that constrain user input, to ensure that it goes into the database in a standard format. This is the best way to encourage convention alignment moving forward, I believe, and a lot of the work will simply be building these kinds of data collection tools for various use-cases.
But the idea of formulating higher-level documented “Conventions” (with a capital C) is a great idea @gbathree - as a way of documenting some of the little decisions we’ve been making in these case-by-case input forms. I could see a whole section of the forum (or other platform) dedicated to this kind of process and documentation.
And I think we should expect multiple overlapping Conventions to develop in parallel. Over time, these “rules” can evolve, merge, split, and reformulate themselves as more and more use-cases and requirements need to be accommodated.
+1 for the word Conventions - I think that terminology helps clarify the goal. The possible elements in a Convention is:
Enforcing a data field selection type. In normal farmOS the Material Name field is text input, but I may want to enforce a selection from a list in my Convention
Provide lists to choose from. Often conventions are about using common lists of data. So having those lists available are important.
Agreeing on where a certain type of data should go. In farmOS, some farm operations could be inputted under a variety of logs… but in my Convention, I may want to be opinionated. For example, you could call irrigation and activity_log or a input_log depending on how you think about irrigation. In my Convention, we may want to ensure irrigation is an input_log.
Conventions are largely enforced in other locations like SurveyStack or Field Kit, and as we discussed it may be less important to make it ‘enforceable’ in farmOS per se. TBD I guess.
The analogy might not be exactly the same… but you could think about the “lower level” data architecture as the “configuration” and the “higher level” decisions about standard patterns as the “conventions”…
In case my sentiments above could be construed differently, I’m very much in favor of defining a “common library” of patterns (opinionated conventions) for how standard data should be mapped to the data model!
In my ideal world, it would be possible to “subscribe” a given farmOS instance to a (versioned) set of conventions which would cause that farmOS instance to enable tooling that validates existing data and helps enter new data in accordance with the convention subscription(s).
I’ve begun drafting a new doc page that tries to summarize the basic “convention” idea, what it is, why it is, and how it relates to the farmOS data model itself. This is a very rough first pass, and I see it as the “introduction page” for a potentially larger section of documentation in general. Welcome all thoughts and feedback!
Edit: changed the link to point directly to file in GitHub branch, rather than my GitHub Pages fork, which is going away.
(Note that this is not ready to be shared broadly. It will be officially published to https://2x.farmOS.org when it is ready, which will eventually become https://docs.farmOS.org when 2.x is released. There are also data model docs that are not finished, so be aware that this is very much a work in progress.)