Documenting conventions

I’d like to propose we establish some standard processes for documenting conventions used in farmOS, in the form of a new Git repository managed by the farmOS community.

For background on what we mean by “conventions”, see: Conventions | farmOS

To quote from an earlier discussion:

A lot of the discussions thusfar have been focused on how to implement conventions in different ways (eg: with quick forms, config entities, JSON schemas, RDF/JSON-LD, tooling for validation, etc). But as a first step, I think we can start by creating some standard specifications for established conventions, and some community processes for proposing/updating them.

I’m imagining high-level documents, that describe specific types of data/processes, and how to record them in farmOS, using RFC 2119 keywords (MUST/SHOULD/MAY/etc).

We can develop a process for proposing/discussing/refining/accepting new conventions over time using GitHub pull requests.

These documents/conventions can also be versioned, and have a process for updating them over time as necessary.

This can follow a similar approach as module development, where the community can experiment and document their own conventions outside of farmOS core, and then propose them for inclusion “upstream” as an “official” convention. This can embody and standardize the community-based consensus process we’ve been using informally thusfar.

These documents can then be used as references when writing actual code. Ideally the documents would also include code examples in PHP, Python, and Javascript (and ideally ideally with automated tests so we can be sure they don’t break).

To kickstart this process, I’ve started a farmOS-conventions repository with a proposed roadmap and template structure. This is just a brain-dump, but if others agree that it’s a good start then perhaps we can move it to the farmOS organization and start working collaboratively on it.

I’ve also started a “work in progress” branch with draft “soil test” and “water test” convention specifications as an example. These would be opened as pull requests for discussion/inclusion if we think the general approach makes sense.

I’d love to hear thoughts on this and see if it would make sense to move it to the official farmOS GitHub organization.

Once we have some conventions established, we could pull them into farmOS.org via Gatsby like we do with the other docs (cc @jgaehring). :smiley:

Related discussions/context:

4 Likes

Imagine adding a new convention_id field to assets/logs which lets you reference the convention ID (eg: farmOS:soil-test:1.0) that you intended to follow. This could be used for quick searching/filtering of records by convention, as well as potentially building validation code (to confirm that it actually does follow the convention, or even prevent certain things from being saved if they don’t).

Also, @jgaehring and @gbathree have been putting a lot of thought into “convention schemas” (JSON schemas that can be used to validate records pulled from the API).

Lots of possible next steps that could come from this!

4 Likes

@Farmer-Ed had some good questions and ideas in chat, linking for reference: IRC logs for #farmOS, 2022-06-07 (GMT) | irc.farmos.org

I like this one a lot:

|[16:43:13]|<FarmerEd[m]> Yea, that’s pretty cool actually, could be particularly useful for regional stuff too where farms may have to record certain data in a certain way for various quality assurance schemes.
|[16:43:25]|<mstenta[m]> yea! good idea!
|[16:43:43]|<mstenta[m]> perhaps that could be part of the ID namespacing potentially
|[16:43:48]|<FarmerEd[m]> Exactly

3 Likes

I will be following this with some interest, honestly I probably need to subscribe to it as much if not more than contribute to it. In many ways just the template for documenting conventions may be most useful since my farmOS may be less conventional than some with customizations (for better or worse I just can’t leave things alone :rofl:).

Perhaps the same conventions document should be supplied when appropriate with custom modules.

Could that field reference some sort of entity that stores the convention and/or a link to its repository?

That in its self could do with a convention! Lots of possibilities from referencing enterprise type (fruit/veg/beef/dairy/poultry etc) to region (EU/US/IRL etc) or a specific QA scheme.

1 Like

Potentially! Although maybe it would be better to have a directory of conventions that is searchable by ID. Or heck, maybe the ID itself could contain a URL, in cases where it isn’t part of the “official” set.

I worry that managing another set of entities within each instance could become an unnecessary burden (and may just not work conceptually). A couple reasons come to mind:

  1. You may never use conventions at all, and if the “core” list grows to hundreds, then they are taking up space for no reason.
  2. In the future, I’m imagining more abilities to “clone” records from one instance to another. If you used a non-standard convention, it would need to clone that entity as well (which would necessitate installing the module that provides the convention - even if that module doesn’t add any additional fields/functionality).

I’m thinking that a simple string ID that references a central list somewhere, like the way schema.org works (I think we can learn a lot from the structure and processes they’ve developed). But also with the ability to have “federated” lists hosted by other groups/individuals (schema.org has experimented with this idea too, although the needs may be a bit different for us: Extending Schemas - schema.org).

Open to ideas on all of this! There are plenty of other similar things out there on the web already we can draw/learn from.

That in its self could do with a convention! Lots of possibilities from referencing enterprise type (fruit/veg/beef/dairy/poultry etc) to region (EU/US/IRL etc) or a specific QA scheme.

Exactly! Might be worth opening a dedicated issue in the proposed repo/roadmap for the ID specification itself.

1 Like

Opened an issue specifically for the ID itself, and added some comments to share my initial thoughts on it: Decide on a convention ID specification · Issue #3 · mstenta/farmOS-conventions · GitHub

1 Like

I think I said this to you recently on a call, @mstenta, but I think another important step in this will be to understand the kind of “inheritance” model that exists between entities, bundles and conventions, where:

  • All data MUST be represented as one of the core entity types (eg, log, asset, etc), which are standard to all farmOS implementations and share certain core fields.
  • All entities MUST have a particular bundle, which extends the core fields with additional fields for that bundle.
  • An entity MAY also have a convention applied to it, which further constrains the fields for the entity and bundle.
  • The convention MAY be specific to a bundle, or it may be generic, being applicable to all entities of that type (eg, it can be applied to only input logs, or it can be applied to any log bundle).

So in essence, conventions inherit from the bundle, and the bundle inherits from the entity.

That may be a fairly intuitive way to convey it to programmers, but I think it will be helpful to have a non-technical explanation of that hierarchy that all users will be able to grok. Specifically, so they understand how and when it is useful to comply to both a shared bundle and convention, like the work we’re doing with SurveyStack currently, versus when it’s sufficient to just conform to the bundle. If that makes sense…?

3 Likes

This is a really important point to consider, I agree @jgaehring! We talked this through a bit on today’s monthly call, so let me see if I can try to summarize for others first…

A lot of the things we’ve been discussing around conventions thusfar (with @gbathree, SurveyStack, and others) has been focused on “validation” of data, because that is one of the most relevant near-term requirements.

There are other aspects to the idea of “conventions” that we should highlight too: “interpretation” of data is one on my mind (for example: if you have different variations of “valid” data - there may be various ways to interpret it, so making implict assumptions explicit is necessary), as well as the ability to make “recommendations” in UI in order to comply more closely to certain conventions (eg: “since you are aiming for this convention, it would be great if you could also fill in fields X, Y, and Z! they aren’t required but would improve your data”).

In my mind, I’ve been thinking about a “convention” as being a bit broader than just a single record. I’m imagining some that might encompass multiple records, as well - in which case a single schema isn’t enough. There are potentially multiple types of entities to validate. But together these entities create a larger meaning (“interpretation”).

A simple example of this might be: “how do we record a planting in farmOS”? The “convention” we generally recommend is that you use a plant asset, and then optionally use a seeding log to denote details about when/where/how much of it was seeded, and/or optionally a transplanting log to describe transplanting(s) of it. You may use different combinations of these two logs in different scenarios, and the specifics of how you do that implies meaning (was this plant started from seed, or was it purchased? was it raised in a greenhouse and then moved out, or was it direct seeded?). Some of this might feel too obvious to even discuss/document, but I consider all of it to be a “convention” nonetheless. If you wanted to, you could bypass assets all together and just record your seeding and harvest logs (but you lose a lot of the richer “meaning” and relationships if you do).

So perhaps there’s still some discussion to be had around what a “convention” is, more broadly, and where the lines are.

That said, I think we can also achieve this broader idea of conventions AND still have entity-level validation.

What if: we make it a requirement that conventions include BOTH a written language specification (in markdown, using RFC 2119 keywords to describe both the specific data structures AND the interpretations of data and relationships) AND one OR MORE code specification documents (in JSON Schema or RDF/JSON-LD) that correspond to specific entity types/bundles.

So taking the “planting” convention as an example, if we assume the overarching “convention ID” is something like farmOS:planting:1.0, then we can save that ID to the convention_id field of ALL the entities (plant asset AND seeding and transplanting logs). Then, we also provide multiple schemas that correspond to each of the entities involved (one for plant assets, and either one each for seeding and transplanting logs or perhaps a combined one that covers both). Thus, when it comes time to validate, the validation logic would look at the plant asset entity, see that it references farmOS:planting:1.0 and use that to find the schema that corresponds to asset entities. At the same time, if the validation were applied to the seeding or transplanting log(s), it would see the same farmOS:planting:1.0 convention referenced on those, and use that to find the schema that corresponds to seeding/transplanting log entities. (There would be some things to figure out technically in that “discovery” process, but let’s assume that is solvable for a moment…)

In other words, the convention is a “lens” to look through - and depending on what type of entity you are looking at might change which schema gets applied during validation.

I think this would keep the idea of “conventions” nice and broad - as it’s really a bridge between human understanding (which naturally include multiple entities and their relationships) and machine understanding (which has to be more specific to individual entities). A single written language specification PLUS (potentially) multiple code specifications feels like the best of all worlds. I feel like that would be a great requirement to include in our process for approving new conventions! Both a human readable document and machine readable document(s).

Certainly hard to put all of this into words in a succinct way - and this probably deserves it’s own dedicated thread to discuss/refine - but hopefully it makes some sense. :slight_smile:

2 Likes

Still thinking about some of these things, and just wanted to say…

… I don’t mean to imply that we necessarily need to document conventions for everything (like this example) - it was just the first example that came to mind that involves multiple entity types under the umbrella of one “convention”.

And I think this speaks to another important point about conventions (as I’ve been thinking about them) worth highlighting: I see the line between “data model” and “convention” as a somewhat blurry one. The asset and log types that we provide in farmOS core are themselves a form of convention, ones that we have adopted implicitly, but conventions nonetheless.

Imagine installing farmOS and ONLY enabling the “Activity” log module - no asset types or other log types. It’s entirely possible to record everything with just activity logs! (Or you could create your own foobar log type and use that instead!) The fact that we provide some sane default types is a form of convention encouragement in itself, in my view.

So it might be helpful to think of “conventions” as an even deeper aspect of the overall data model. That is how I’ve been thinking about them anyway. Perhaps it is overly relativistic thinking on my part. :slight_smile:

But… I think it also helps to illustrate how conventions “harden” over time. They may start as just an idea - a pattern that is followed - but as those patterns get used more, and are adopted in actual CODE (eg: quick forms, interpretational logic, asset/log types, field definitions), they become the data model. This is the ever-evolving flow of consensus into code! :smiley:

This is also why I started the Log module as a general Drupal contrib module back in 2014, alongside farmOS, but not part of farmOS. Someone else could take the same module and develop a record keeping system for something entirely different than farming. And their log types and base/bundle fields could be completely different. The log entity type is the common denominator, but everything we build on top of that is, more-or-less, a “convention”.

1 Like

I think this is a great idea, totally on board with it as I need to start documenting the Rothamsted conventions (as much for myself as anyone else!) so the timing is perfect if we are going to implement an ‘convention’ for documenting ‘conventions’. :slight_smile:

Reading through the discussions here are a few things I would re-inforce from a UK/ academic perspective:

  1. |[16:43:13]|<FarmerEd[m]> Yea, that’s pretty cool actually, could be particularly useful for regional stuff too where farms may have to record certain data in a certain way for various quality assurance schemes.

I couldn’t agree with this more. In the UK, we have both legal and quality assurance obligations (Red Tractor is the industry standard here). Conventions which help enforce that compliance tend to be readily adopted and are often useful.

For the experiments, we have similar requirements but they tend to be more academic. For example, they need to follow external data standards like the ISA Framework and the FarmOS conventions help us implement this and ensure the data is re-useable and interoperable with other processes. As an aside, ontology annotation will be required as standard requirement for this experiment work.

Very happy to use the RFC 2119 keywords (MUST/SHOULD/MAY/etc). Great idea

What if: we make it a requirement that conventions include BOTH a written language specification (in markdown, using RFC 2119 keywords to describe both the specific data structures AND the interpretations of data and relationships) AND one OR MORE code specification documents (in JSON Schema or RDF/JSON-LD) that correspond to specific entity types/bundles.

As someone who isn’t that proficient at coding, but who has to document conventions in such a way that they can be understood by our farm and science teams I think the written language specification is as important as the code specification. Also, it helps to be aware that sometimes these are two different skill sets, especially in this ‘tech’/ agriculture collaboration space. For example, in the Rothamsted collaboration most of the code documentation is done by @paul121, but the convention documentation is done by me.

I think the idea of creating a convention ID is great. That would massively reduce some of my concerns about transitioning between versions of the quick forms/ expriment module we are building.

All in all, very supportive @mstenta - thanks for raising the issue.

4 Likes

Thanks @aislinnpearson! I think this point alone is enough to convince me that the code specification should be optional. There is value in adopting written language specs for reasons beyond just validation. And then the code specifications can be contributed as they are written and needed. Because it’s true: “writing code” will always be a bigger bottleneck than “writing words”, and in this case I don’t think code needs to hold up adoption of conventions.

4 Likes

We discussed this briefly on yesterday’s call, but the nice thing about those RDF formats is that code can then be used to render them in plain language, as well as providing more helpful visuals like tables or diagrams. Perhaps most importantly, however, similar code can also be used by a GUI to generate new conventions without any knowledge of RDF, or even familiarity with the RFC 2119 keywords, for that matter. And to my mind, that is the best of all worlds!

So yea, I guess I agree with this in the sense that as long as there’s a “code specification” (although technically, “structured data format” or RDF, not necessarily executable code), the plain language version can be derived programmatically, which is not necessarily true in reverse (although, GPT-3 anyone?). Maybe it would be better to say the structured data format is preferred, but a plain language version is the minimal requirement?

In any event, it will be great to see @aislinnpearson’s plain language conventions, because that will bring us so much closer to the GUI dream, because I’m sure it will illuminate more edge cases and things we have not considered to make it more intuitive and powerful at the same time.

All exciting stuff!

2 Likes

I like this. And “structured data format” is a better way to say it. :slight_smile:

2 Likes

Just FYI I’m trying this out, using some of the new JSON schema structure that @jgaehring and Octavio discussed. Using the separate .md file as mike suggested, and took most of his headers (though made some small changes too, keeping track of them also to post here in discussion once I have enough).

I think a key difference is really that a convention is a bucket of schemas… so we need to clarify that a tillage convention isn’t really a convention, it’s a set of schemas (a tillage log schema, a tillage_stir quantity, and a tillage_depth quantity, etc.) and relationships. A convention is a broader concept around a group’s use of a set of schemas for some purpose / application or with some explainable intent / conceptual framing.

I’ll share more, but working on this this week.

3 Likes