Generalizing Log-based Asset Attributes

Symbioquine · May 24, 2024, 3:37pm

TL;DR; We have an opportunity to generalize how farmOS models log-based data. Short-term it would allow for arbitrary log-based attributes. Eventually it could lead to a simpler, more consistent state where most of the data in farmOS is captured via logs.

Please feel free to critique or otherwise contribute to this proposal in this thread. I will do my best to maintain this original post as the summation of the overall discussion and proposal status.

Background

Right now data for assets is defined a few ways;

farmOS core asset fields - name, status, flags, intrinsic geometry, etc
farmOS core asset bundle fields - birthdate, sex, plant type, etc
Specific log-based attributes - location, geometry, inventory, and group membership

Folks use farmOS in different ways.

Some seem to be mainly concerned with capturing the current state of their farm - so most of the key information is stored directly on assets and maybe in pending logs. For others, the history of what was done when, why, where, and by whom is just as important as the current state.

The tension between these two categories of use-cases is reflected in the way data for assets is modeled. It is why there is both intrinsic and log-based geometry. It also causes frequent confusion and feature-requests where the nature of the data-model for a given field is unexpected or undesirable.

For example, several folks have been surprised that changes to the asset status or is_castrated state aren’t modeled via logs. It is easy to see how these lead to apparent inconsistencies when one looks back in time - leading to questions like “was animal A still in X location or was it already archived (slaughtered?/dead?/sold?) before animal B was moved there?”, or “How does the timing of determining the sex or sterilizing these animals coincide with movement, group, co-habitation changes?”

Animal assets seem in particular fraught with those kinds of challenges, but we can imagine other ones. Such as changes to the group membership or inventory of archived assets in general. Similarly, some flags or id tag changes may not make sense when they aren’t modeled in a way that reflects when they occurred (were applied) in time.

Inconsistencies aside, there is also the question of how to capture information that goes along with a given state change. Like how do you make it clear that the log describing the slaughter of some animals or the harvest of a crop is associated with the archived state change.

farmOS has revision history for assets and logs, but a textual revision comment is not a good substitute for the structured data that can be captured in a log. The revision history also captures changes for different kinds of “intents”. Looking back in time it is not possible to tell whether changes to the fields of an asset in the revision history are there because that is the time the change happened or because that is just when it got entered into farmOS, some churn in the revision history may even reflect errors and fixes to the data that never reflected what happened on the real farm. The revision history is a useful feature or other reasons, but should not be relied on for accurate historical data.

Inspiration & Challenges

We can learn from the success of how the inventory system is modeled in farmOS and generalize a bit further to achieve arbitrary log-based attributes.

Specifically, logs can have any number of inventory adjustments which each separately describe which asset is being affected and how the inventory is being adjusted. This allows a single log to capture complex changes to multiple assets inventory as a single logical event. From a data-model perspective it is also beautiful because it allows those changes to occur atomically - one database write (of the log) finalizes and instantly applies the described inventory adjustments to all the affected assets.

There are several hurdles to generalizing the inventory system to arbitrary attributes though;

How does farmOS know what attributes are possible for a given asset?
- Can log-based attributes conflict or otherwise overlap with other data?
- What are the performance implications of supporting arbitrary log-based attributes? Can we avoid always needing to load all logs to determine the data for an asset?
How do different data types work?
- Numeric increment, decrement, and reset don’t make sense for attributes of type boolean, string, list<string>, etc
How does the UX work to help this be less frustrating to interact with?
- There are a lot of clicks involved in adding inventory adjustments.
- How do users know where to go to modify which fields/attributes?
How does this affect farmOS’ maintenance posture?
- This presumably adds yet another implementation of the log-based attribute logic - is it realistic to merge/share some of the implementation details?
- Is there a realistic path to using this system to model most/all asset attributes?
  - Could that make farmOS more maintainable?
  - Would that be a good outcome for all/most users? With the right UX maybe?

Proposal Overview

Make a new entity type asset_attribute_operation with no base fields.
Make bundles for the asset_attribute_operation which define each attribute datatype - including the parameters of the operation and the logic for combining the set of operations to produce the current/historical attribute value for an asset.
- Generally, the bundle fields would describe:
  - Operation Type
  - Asset(s?)
  - Operation Data
- Examples:
  - asset_attribute_operation--boolean_operation
  - asset_attribute_operation--numeric_operation
  - asset_attribute_operation--string_list_operation
  - asset_attribute_operation--id_tag_operation
Make a new config entity type which actually defines the asset attributes.
- Attribute name: is_sterilized, Attribute Type: asset_attribute_operation--boolean_operation, Asset Bundles: ['animal']
- Attribute name: flags, Attribute Type: asset_attribute_operation--string_list_operation, Asset Bundles: ['*']
- Attribute name: id_tags, Attribute Type: asset_attribute_operation--id_tag_operation, Asset Bundles: ['*']

Examples

is_sterilized

{
  "type": "asset_attribute_operation--boolean_operation",
  "id": "8dee5028-ea58-4a82-973a-b631a8771edc",
  "attributes": {
    "drupal_internal__id": 200,
    "operation_type": "set_boolean",
    "attribute_name": "is_sterilized",
    "value": true,
  },
  "relationships": {
    "asset": {
        "data": [
            {
              "type": "asset--animal",
              "id": "df0e26d5-6b3b-4fe7-976c-2114e1942991",
            },
            {
              "type": "asset--animal",
              "id": "0637d633-d171-474f-b6c6-8b91af4eeeff",
            }
        ]
    }
  }
}

id_tags

{
  "type": "asset_attribute_operation--id_tag_operation",
  "id": "fd82001c-3693-498b-bfe8-b217232622f6",
  "attributes": {
    "drupal_internal__id": 200,
    "operation_type": "add_id_tag",
    "tag_location": "ear",
    "tag_type": "RFID",
    "tag_number": "1234"
  },
  "relationships": {
    "asset": {
        "data": [
            {
              "type": "asset--animal",
              "id": "df0e26d5-6b3b-4fe7-976c-2114e1942991",
            },
            {
              "type": "asset--animal",
              "id": "0637d633-d171-474f-b6c6-8b91af4eeeff",
            }
        ]
    }
  }
}

FAQ

How are additional attributes defined?

If the relevant datatype exists, they can be defined just be creating a config entity for them. To start these config entities would probably be YAML, but eventually a UI could be created to add new log-based attributes.

How are additional log-based attribute datatypes defined?

These are defined in modules which provide bundles for asset_attribute_operation. Along with the fields of the operation, they also define the logic to combining the set of operations (of that datatype) to produce the current/historical attribute value(s) for an asset.

Why does `asset_attribute_operation` not define any base fields?

Some fields like the operation_type or the asset reference(s) would seem obvious candidates to be base fields, but doing this unnecessarily (IMHO) limits the type of operations without (again IMHO) providing much in return. It doesn’t simplify the definition of new attribute datatypes much and doesn’t enable any important optimizations.

Leaving the operation_type up to the bundle means that the bundle can define the enumeration of possible fields. Leaving the asset relationship up to the bundle means that the bundle can constrain the cardinality of the asset(s) that the operation applies to - e.g. it might not ever make sense to add the same id tag to more than one asset so it can limit the cardinality to N=1.

What about outside implementations? How can they be expected to keep up with an explosion of special log-based attribute operation logic?

Hopefully, the set of attribute datatypes is fairly small so this isn’t vastly more complex than keeping up with the implementations of location, geometry, group membership and inventory today.

Further, only very specific applications should actually need to do these calculations outside farmOS. The farmOS API includes the computed values of those existing attributes and would include the computed value of these generalized ones too. Only applications that need to work offline or predict the effect of modifications would need to replicate the logic.

A future area for exploration could be a platform/language agnostic description of the logic that could be interpreted by outside applications, but that is not part of this proposal.

How would this proposal handle conflicts between these log-based attributes and existing fields - or fields defined by newly installed modules?

This is an area that needs additional work, but some combination of hooks and namespacing the log-based attributes should work.

Could additional modules expand the operations defined for a given datatype?

In theory, the operations might be extensible, but that is not recommended by this proposal.

The reason is that it is already fairly complex to implement the logic to combine the operations for a given datatype to compute the current/historical value even when one knows the possible operations at the time code for that logic is being written. Needing to write that logic to support arbitrary operations probably isn’t worth it.

What’s the point of the config entity declaring the log-based attributes?

This allows farmOS to know the set of possible attributes for a given asset - and where to look for the implementation to compute the current/historical value(s) for that attribute.

It also provides a key validation point for farmOS to check for conflicts and helps reduce the number of places errors like typos can creep into the attribute names - i.e. only when declaring the log-based attribute, not each time it is operated on.

How does the cardinality of these log-based attributes work?

The attribute datatype bundles define how the cardinality for each datatype works. Some may be a single boolean (like is_sterilized), others may effectively be a list of objects (like id_tag - with the three pieces of information for each tag). It would even be possible to define a general key-value attribute this way. The operations for each datatype need to be crafted to effectively manipulate the desired structure/cardinality.

One cool thing is that this could also allow for state-machines and other higher-level datatypes as well.

TODO… more FAQs…

BOTLFarm · May 27, 2024, 12:30am

@Symbioquine this is so well thought out, at least with my low level of understanding of programing. I think this concept can really open up new doors throughout the platform, and am excited to see how it develops.

mstenta · May 28, 2024, 2:38pm

Agreed - it was great to talk through some of the details on the last dev call! I shared some of the ideas with @paul121 afterwards. Looking forward to talking more about it. This could provide a nice way to capture additional asset data without needing to add custom fields, too.

It seems to me like the biggest open questions are around UX/DX. As mentioned, the UX of manually recording inventory adjustments is tedious (although second-layer UIs like the Inventory Quick Form ease some of that). I wonder what the UX would look like for something even more general. And similarly, the DX (developer experience) of recording these attribute adjustments is worth thinking through. Right now adding a log with quantities via the API requires creating multiple entities. This would increase that burden.

I assume we would also need to add a new field to logs (like quantity) for referencing these asset_attribute_operation entities.

Exciting to think about what this will enable! And a lot to figure out for an MVP…

Symbioquine · May 28, 2024, 2:51pm

Yes, you’re right. I forgot to mention that in my proposal overview - will add it…

WHFarms · July 13, 2024, 7:40am

Just wondering if this has gotten any traction yet?

Symbioquine · July 13, 2024, 1:27pm

@WHFarms “traction” is an interesting word…

I wrote up the proposal because I think this might be a good direction for the farmOS data model and nobody has come out and said they think it’s a bad idea yet.

That said, it also needs a bit more fleshing out at the design (this) stage, then quite a lot of work at the implementation stage. So far this isn’t work that anybody (including myself) has a timeline or funding to implement.

WHFarms · July 13, 2024, 2:56pm

Understood. I have a large animal module that I want to build but want this done fitst.

Cory Raisbeck
Whispering Hill Farms
(608)412-2867

mstenta · July 15, 2024, 2:52pm

@WHFarms If you have the resources to invest in this, and don’t want to be blocked waiting for farmOS core (which has to move slower and more carefully to consider backwards compatibility) you can always start working on things in contrib modules. The flexibility of the Drupal/farmOS module system allows for a lot of power to make these kinds of big changes outside of core. The only consideration is that you are then maintaining your own code and need to make sure it remains compatible as farmOS itself changes.

Just want to make it clear that you don’t have to wait for others if you have the resources to forge ahead on your needs.

Topic		Replies	Views
Workshopping the farmOS Data Model, Conventions, and Schema Formats Conventions	10	218	June 6, 2024
Additional Asset Fields Development	6	145	May 31, 2024
Interest in a new Farm asset type? Using farmOS feature-request	11	272	May 8, 2024
Maintenance Log (v2) Using farmOS workflow	19	556	March 17, 2022
Assigned Asset Group - Logs not showing Using farmOS support-request	11	264	May 7, 2024