Documenting conventions

mstenta · June 7, 2022, 8:21pm

I’d like to propose we establish some standard processes for documenting conventions used in farmOS, in the form of a new Git repository managed by the farmOS community.

For background on what we mean by “conventions”, see: Conventions | farmOS

To quote from an earlier discussion:

Creating a standard 'interpretation' layer on top of core farmOS schema

I’m excited to see the community move towards a more formal process for these kinds of things! Since the beginning I’ve felt that there are two sides to designing the farmOS data model. On the one hand, as developers, we are coding a flexible data architecture that can be used to represent the full breadth of possibilities in farm production systems. This requires limiting how “opinionated” the architecture is, so that we don’t create barriers. On the other hand, as a community of users, we are honing in on “conventions” on top of the data architecture, which standardize HOW the available data structures are used in practice.

I see this as a necessary process to work through together, and I believe that these conventions will naturally bubble up over time as more people use farmOS and share what they are doing. And especially as we start to build some of the bigger cross-instance features (eg: sharing and aggregating data). In order for data to be “comparable”, it needs to be following the same conventions.

We’ve been taking small steps already, with quick forms and surveys that constrain user input, to ensure that it goes into the database in a standard format. This is the best way to encourage convention alignment moving forward, I believe, and a lot of the work will simply be building these kinds of data collection tools for various use-cases.

But the idea of formulating higher-level documented “Conventions” (with a capital C) is a great idea @gbathree - as a way of documenting some of the little decisions we’ve been making in these case-by-case input forms. I could see a whole section of the forum (or other platform) dedicated to this kind of process and documentation.

And I think we should expect multiple overlapping Conventions to develop in parallel. Over time, these “rules” can evolve, merge, split, and reformulate themselves as more and more use-cases and requirements need to be accommodated.

A lot of the discussions thusfar have been focused on how to implement conventions in different ways (eg: with quick forms, config entities, JSON schemas, RDF/JSON-LD, tooling for validation, etc). But as a first step, I think we can start by creating some standard specifications for established conventions, and some community processes for proposing/updating them.

I’m imagining high-level documents, that describe specific types of data/processes, and how to record them in farmOS, using RFC 2119 keywords (MUST/SHOULD/MAY/etc).

We can develop a process for proposing/discussing/refining/accepting new conventions over time using GitHub pull requests.

These documents/conventions can also be versioned, and have a process for updating them over time as necessary.

This can follow a similar approach as module development, where the community can experiment and document their own conventions outside of farmOS core, and then propose them for inclusion “upstream” as an “official” convention. This can embody and standardize the community-based consensus process we’ve been using informally thusfar.

These documents can then be used as references when writing actual code. Ideally the documents would also include code examples in PHP, Python, and Javascript (and ideally ideally with automated tests so we can be sure they don’t break).

To kickstart this process, I’ve started a farmOS-conventions repository with a proposed roadmap and template structure. This is just a brain-dump, but if others agree that it’s a good start then perhaps we can move it to the farmOS organization and start working collaboratively on it.

I’ve also started a “work in progress” branch with draft “soil test” and “water test” convention specifications as an example. These would be opened as pull requests for discussion/inclusion if we think the general approach makes sense.

Example soil test convention: https://github.com/mstenta/farmOS-conventions/blob/wip/conventions/farmOS/soil-test/1.0/index.md
Example water test convention: https://github.com/mstenta/farmOS-conventions/blob/wip/conventions/farmOS/water-test/1.0/index.md

I’d love to hear thoughts on this and see if it would make sense to move it to the official farmOS GitHub organization.

Once we have some conventions established, we could pull them into farmOS.org via Gatsby like we do with the other docs (cc @jgaehring).

Related discussions/context:

mstenta · June 7, 2022, 8:28pm

Imagine adding a new convention_id field to assets/logs which lets you reference the convention ID (eg: farmOS:soil-test:1.0) that you intended to follow. This could be used for quick searching/filtering of records by convention, as well as potentially building validation code (to confirm that it actually does follow the convention, or even prevent certain things from being saved if they don’t).

Also, @jgaehring and @gbathree have been putting a lot of thought into “convention schemas” (JSON schemas that can be used to validate records pulled from the API).

Lots of possible next steps that could come from this!

mstenta · June 7, 2022, 8:49pm

@Farmer-Ed had some good questions and ideas in chat, linking for reference: IRC logs for #farmOS, 2022-06-07 (GMT) | irc.farmos.org

I like this one a lot:

|[16:43:13]|<FarmerEd[m]> Yea, that’s pretty cool actually, could be particularly useful for regional stuff too where farms may have to record certain data in a certain way for various quality assurance schemes.
|[16:43:25]|<mstenta[m]> yea! good idea!
|[16:43:43]|<mstenta[m]> perhaps that could be part of the ID namespacing potentially
|[16:43:48]|<FarmerEd[m]> Exactly

Farmer-Ed · June 8, 2022, 7:56am

I will be following this with some interest, honestly I probably need to subscribe to it as much if not more than contribute to it. In many ways just the template for documenting conventions may be most useful since my farmOS may be less conventional than some with customizations (for better or worse I just can’t leave things alone ).

Perhaps the same conventions document should be supplied when appropriate with custom modules.

Could that field reference some sort of entity that stores the convention and/or a link to its repository?

That in its self could do with a convention! Lots of possibilities from referencing enterprise type (fruit/veg/beef/dairy/poultry etc) to region (EU/US/IRL etc) or a specific QA scheme.

mstenta · June 8, 2022, 11:47am

Potentially! Although maybe it would be better to have a directory of conventions that is searchable by ID. Or heck, maybe the ID itself could contain a URL, in cases where it isn’t part of the “official” set.

I worry that managing another set of entities within each instance could become an unnecessary burden (and may just not work conceptually). A couple reasons come to mind:

You may never use conventions at all, and if the “core” list grows to hundreds, then they are taking up space for no reason.
In the future, I’m imagining more abilities to “clone” records from one instance to another. If you used a non-standard convention, it would need to clone that entity as well (which would necessitate installing the module that provides the convention - even if that module doesn’t add any additional fields/functionality).

I’m thinking that a simple string ID that references a central list somewhere, like the way schema.org works (I think we can learn a lot from the structure and processes they’ve developed). But also with the ability to have “federated” lists hosted by other groups/individuals (schema.org has experimented with this idea too, although the needs may be a bit different for us: Extending Schemas - schema.org).

Open to ideas on all of this! There are plenty of other similar things out there on the web already we can draw/learn from.

That in its self could do with a convention! Lots of possibilities from referencing enterprise type (fruit/veg/beef/dairy/poultry etc) to region (EU/US/IRL etc) or a specific QA scheme.

Exactly! Might be worth opening a dedicated issue in the proposed repo/roadmap for the ID specification itself.

mstenta · June 8, 2022, 12:12pm

Opened an issue specifically for the ID itself, and added some comments to share my initial thoughts on it: Decide on a convention ID specification · Issue #3 · mstenta/farmOS-conventions · GitHub

jgaehring · June 8, 2022, 7:00pm

I think I said this to you recently on a call, @mstenta, but I think another important step in this will be to understand the kind of “inheritance” model that exists between entities, bundles and conventions, where:

All data MUST be represented as one of the core entity types (eg, log, asset, etc), which are standard to all farmOS implementations and share certain core fields.
All entities MUST have a particular bundle, which extends the core fields with additional fields for that bundle.
An entity MAY also have a convention applied to it, which further constrains the fields for the entity and bundle.
The convention MAY be specific to a bundle, or it may be generic, being applicable to all entities of that type (eg, it can be applied to only input logs, or it can be applied to any log bundle).

So in essence, conventions inherit from the bundle, and the bundle inherits from the entity.

That may be a fairly intuitive way to convey it to programmers, but I think it will be helpful to have a non-technical explanation of that hierarchy that all users will be able to grok. Specifically, so they understand how and when it is useful to comply to both a shared bundle and convention, like the work we’re doing with SurveyStack currently, versus when it’s sufficient to just conform to the bundle. If that makes sense…?

mstenta · June 9, 2022, 12:18am

This is a really important point to consider, I agree @jgaehring! We talked this through a bit on today’s monthly call, so let me see if I can try to summarize for others first…

A lot of the things we’ve been discussing around conventions thusfar (with @gbathree, SurveyStack, and others) has been focused on “validation” of data, because that is one of the most relevant near-term requirements.

There are other aspects to the idea of “conventions” that we should highlight too: “interpretation” of data is one on my mind (for example: if you have different variations of “valid” data - there may be various ways to interpret it, so making implict assumptions explicit is necessary), as well as the ability to make “recommendations” in UI in order to comply more closely to certain conventions (eg: “since you are aiming for this convention, it would be great if you could also fill in fields X, Y, and Z! they aren’t required but would improve your data”).

In my mind, I’ve been thinking about a “convention” as being a bit broader than just a single record. I’m imagining some that might encompass multiple records, as well - in which case a single schema isn’t enough. There are potentially multiple types of entities to validate. But together these entities create a larger meaning (“interpretation”).

A simple example of this might be: “how do we record a planting in farmOS”? The “convention” we generally recommend is that you use a plant asset, and then optionally use a seeding log to denote details about when/where/how much of it was seeded, and/or optionally a transplanting log to describe transplanting(s) of it. You may use different combinations of these two logs in different scenarios, and the specifics of how you do that implies meaning (was this plant started from seed, or was it purchased? was it raised in a greenhouse and then moved out, or was it direct seeded?). Some of this might feel too obvious to even discuss/document, but I consider all of it to be a “convention” nonetheless. If you wanted to, you could bypass assets all together and just record your seeding and harvest logs (but you lose a lot of the richer “meaning” and relationships if you do).

So perhaps there’s still some discussion to be had around what a “convention” is, more broadly, and where the lines are.

That said, I think we can also achieve this broader idea of conventions AND still have entity-level validation.

What if: we make it a requirement that conventions include BOTH a written language specification (in markdown, using RFC 2119 keywords to describe both the specific data structures AND the interpretations of data and relationships) AND one OR MORE code specification documents (in JSON Schema or RDF/JSON-LD) that correspond to specific entity types/bundles.

So taking the “planting” convention as an example, if we assume the overarching “convention ID” is something like farmOS:planting:1.0, then we can save that ID to the convention_id field of ALL the entities (plant asset AND seeding and transplanting logs). Then, we also provide multiple schemas that correspond to each of the entities involved (one for plant assets, and either one each for seeding and transplanting logs or perhaps a combined one that covers both). Thus, when it comes time to validate, the validation logic would look at the plant asset entity, see that it references farmOS:planting:1.0 and use that to find the schema that corresponds to asset entities. At the same time, if the validation were applied to the seeding or transplanting log(s), it would see the same farmOS:planting:1.0 convention referenced on those, and use that to find the schema that corresponds to seeding/transplanting log entities. (There would be some things to figure out technically in that “discovery” process, but let’s assume that is solvable for a moment…)

In other words, the convention is a “lens” to look through - and depending on what type of entity you are looking at might change which schema gets applied during validation.

I think this would keep the idea of “conventions” nice and broad - as it’s really a bridge between human understanding (which naturally include multiple entities and their relationships) and machine understanding (which has to be more specific to individual entities). A single written language specification PLUS (potentially) multiple code specifications feels like the best of all worlds. I feel like that would be a great requirement to include in our process for approving new conventions! Both a human readable document and machine readable document(s).

Certainly hard to put all of this into words in a succinct way - and this probably deserves it’s own dedicated thread to discuss/refine - but hopefully it makes some sense.

mstenta · June 9, 2022, 1:14pm

Still thinking about some of these things, and just wanted to say…

… I don’t mean to imply that we necessarily need to document conventions for everything (like this example) - it was just the first example that came to mind that involves multiple entity types under the umbrella of one “convention”.

And I think this speaks to another important point about conventions (as I’ve been thinking about them) worth highlighting: I see the line between “data model” and “convention” as a somewhat blurry one. The asset and log types that we provide in farmOS core are themselves a form of convention, ones that we have adopted implicitly, but conventions nonetheless.

Imagine installing farmOS and ONLY enabling the “Activity” log module - no asset types or other log types. It’s entirely possible to record everything with just activity logs! (Or you could create your own foobar log type and use that instead!) The fact that we provide some sane default types is a form of convention encouragement in itself, in my view.

So it might be helpful to think of “conventions” as an even deeper aspect of the overall data model. That is how I’ve been thinking about them anyway. Perhaps it is overly relativistic thinking on my part.

But… I think it also helps to illustrate how conventions “harden” over time. They may start as just an idea - a pattern that is followed - but as those patterns get used more, and are adopted in actual CODE (eg: quick forms, interpretational logic, asset/log types, field definitions), they become the data model. This is the ever-evolving flow of consensus into code!

This is also why I started the Log module as a general Drupal contrib module back in 2014, alongside farmOS, but not part of farmOS. Someone else could take the same module and develop a record keeping system for something entirely different than farming. And their log types and base/bundle fields could be completely different. The log entity type is the common denominator, but everything we build on top of that is, more-or-less, a “convention”.

aislinnpearson · June 10, 2022, 9:08am

I think this is a great idea, totally on board with it as I need to start documenting the Rothamsted conventions (as much for myself as anyone else!) so the timing is perfect if we are going to implement an ‘convention’ for documenting ‘conventions’.

Reading through the discussions here are a few things I would re-inforce from a UK/ academic perspective:

|[16:43:13]|<FarmerEd[m]> Yea, that’s pretty cool actually, could be particularly useful for regional stuff too where farms may have to record certain data in a certain way for various quality assurance schemes.

I couldn’t agree with this more. In the UK, we have both legal and quality assurance obligations (Red Tractor is the industry standard here). Conventions which help enforce that compliance tend to be readily adopted and are often useful.

For the experiments, we have similar requirements but they tend to be more academic. For example, they need to follow external data standards like the ISA Framework and the FarmOS conventions help us implement this and ensure the data is re-useable and interoperable with other processes. As an aside, ontology annotation will be required as standard requirement for this experiment work.

Very happy to use the RFC 2119 keywords (MUST/SHOULD/MAY/etc). Great idea

What if: we make it a requirement that conventions include BOTH a written language specification (in markdown, using RFC 2119 keywords to describe both the specific data structures AND the interpretations of data and relationships) AND one OR MORE code specification documents (in JSON Schema or RDF/JSON-LD) that correspond to specific entity types/bundles.

As someone who isn’t that proficient at coding, but who has to document conventions in such a way that they can be understood by our farm and science teams I think the written language specification is as important as the code specification. Also, it helps to be aware that sometimes these are two different skill sets, especially in this ‘tech’/ agriculture collaboration space. For example, in the Rothamsted collaboration most of the code documentation is done by @paul121, but the convention documentation is done by me.

I think the idea of creating a convention ID is great. That would massively reduce some of my concerns about transitioning between versions of the quick forms/ expriment module we are building.

All in all, very supportive @mstenta - thanks for raising the issue.

mstenta · June 10, 2022, 2:10pm

Thanks @aislinnpearson! I think this point alone is enough to convince me that the code specification should be optional. There is value in adopting written language specs for reasons beyond just validation. And then the code specifications can be contributed as they are written and needed. Because it’s true: “writing code” will always be a bigger bottleneck than “writing words”, and in this case I don’t think code needs to hold up adoption of conventions.

jgaehring · June 10, 2022, 5:18pm

We discussed this briefly on yesterday’s call, but the nice thing about those RDF formats is that code can then be used to render them in plain language, as well as providing more helpful visuals like tables or diagrams. Perhaps most importantly, however, similar code can also be used by a GUI to generate new conventions without any knowledge of RDF, or even familiarity with the RFC 2119 keywords, for that matter. And to my mind, that is the best of all worlds!

So yea, I guess I agree with this in the sense that as long as there’s a “code specification” (although technically, “structured data format” or RDF, not necessarily executable code), the plain language version can be derived programmatically, which is not necessarily true in reverse (although, GPT-3 anyone?). Maybe it would be better to say the structured data format is preferred, but a plain language version is the minimal requirement?

In any event, it will be great to see @aislinnpearson’s plain language conventions, because that will bring us so much closer to the GUI dream, because I’m sure it will illuminate more edge cases and things we have not considered to make it more intuitive and powerful at the same time.

All exciting stuff!

mstenta · June 10, 2022, 5:41pm

I like this. And “structured data format” is a better way to say it.

gbathree · July 6, 2022, 6:30pm

Just FYI I’m trying this out, using some of the new JSON schema structure that @jgaehring and Octavio discussed. Using the separate .md file as mike suggested, and took most of his headers (though made some small changes too, keeping track of them also to post here in discussion once I have enough).

I think a key difference is really that a convention is a bucket of schemas… so we need to clarify that a tillage convention isn’t really a convention, it’s a set of schemas (a tillage log schema, a tillage_stir quantity, and a tillage_depth quantity, etc.) and relationships. A convention is a broader concept around a group’s use of a set of schemas for some purpose / application or with some explainable intent / conceptual framing.

I’ll share more, but working on this this week.

mstenta · January 24, 2023, 5:55pm

Update: I opened a feature request to add a convention field to asset and log entities in farmOS core.

I see this as an important first step as we continue this conversation. It will allow module developers and API users to begin experimenting with conventions in a more explicit way, by declaring which conventions their assets/logs adhere to.

I also hope to finalize an initial set of “core conventions” as well as the template for convention documents soon, in hopes that we can publish them to farmOS.org as a way to kickstart the process and inspire others to being documenting/publishing their own conventions. More to come…

mstenta · February 1, 2023, 10:46pm

I’ve made a few adjustments to the structure and template in my farmOS-conventions repo. Described in these comments:

Decide on a convention ID specification · Issue #3 · mstenta/farmOS-conventions · GitHub
Design a template structure for convention documentation · Issue #2 · mstenta/farmOS-conventions · GitHub

aislinnpearson · February 2, 2023, 12:52pm

Hi Mike,

I’m followign all this with interest as we are really going to have to document our own conventions soon. One comment from this Github comment

I wonder if the namespace could also be used to point to a repository, in the case of non-core conventions? I don’t know if including a full URL would be overkill for this…

As academics, this would count as research output so at some point we might want to assign a DOI, at least for the Rothamsted conventions. Having never had to do that, I am not sure what the process is but though it was worth mentioning as a side note.

mstenta · February 2, 2023, 1:18pm

Great point @aislinnpearson! I wonder if that’s something that could be included in the template (optionally of course)? Feel free to comment on the template issue so we keep that in the considerations! Design a template structure for convention documentation · Issue #2 · mstenta/farmOS-conventions · GitHub

mstenta · February 8, 2023, 8:51pm

FYI on the monthly call today we decided to make a dedicated forum category for conventions: Conventions - farmOS

As well as a pinned “Quick Links” topic: Wiki: Conventions Quick Links - #3

Topic		Replies	Views
Conventions as config entities Development	8	452	November 8, 2021
Creating a standard 'interpretation' layer on top of core farmOS schema Development	16	1014	August 31, 2021
About the Conventions category Conventions	1	159	February 8, 2023
Conventions Call August 1st 2024 - How to handle Descriptions, Cost Data Conventions conventions-meeting	0	35	August 1, 2024
Wiki: Conventions Quick Links Conventions	4	537	February 10, 2023

Documenting conventions

Related topics