CSV Importers in v2.x

Continuing the discussion from Example farmOS.py CSV import script, following @paul121’s advice to develop my own script for upload of CSV data related to our farm’s planting & harvest data, i want to begin this process with a question for the User Community -to wit:

As a user of farmOS v1.x, is the “Import CSV” function one you use regularly, and/or consider to be essential?

If so, then any information you might care to share about how you use it and why, what aspects of the feature you like and don’t like, and any thoughts you have about what you’d like to see in v2 of this feature would be most helpful to hear at this point.

NB: As concerns the technology involved in porting this feature to v2.x, there’s an interesting thread about farmOS 2.x Importers on the Drupal developers board, including a pointer to some encouraging recent developments, but what i find most relevant to us end-users is what @mstenta states as the bottom line of his Problem/Motivation statement:

The key requirement here, and the ultimate deciding factor, will be user experience. Many of the “import” options for Drupal are built with the assumption that they will be used by a site administrator. In farmOS, CSV imports are something that non-administrators can also use.

Amen! (emphasis mine :wink: )

/walt

3 Likes

In response to the question i posed above, i guess it’s only fair that i should share something about my own use of the CSV Import feature. To that end, the best starting point i can provide is this brief “User Story” from the field, collected when we began this project to integrate farmOS into our production management workflow:

“In my Role as PM:OMG (Production Manager for the Organic Market Garden), the harvest data that i gather 2x /week from the OMG team in spreadsheet form needs to be uploaded to farmOS on a weekly basis with as little friction as possible.”

Details: Rows in this spreadsheet of Assets harvested are defined in terms of 3 key attributes: Year/season + Crop/variety + Row/bed; additional attributes must include Quantity/units, Date of harvest, plus optional notes (e.g.: qualitative factors, exceptional conditions, staff ID).

Essentially: What we want from farmOS is:

  • fast & easy upload of data
  • fast & secure storage without any loss of granular detail
  • a responsive online interface for all users to enjoy low-latency access to whatever info is wanted (and nothing more), as appropriate to his/her role w/in the system

Details of this UseCase are still TBD, depending on system capabilities, but i attach below a Use Case Diagram to illustrate how this particular function of the PM:OMG role fits into the bigger picture.

4 Likes

The biggest problem i have with this feature is held in these two explanatory points near top of the Import form -on planting page, in this case:

  1. Import CSV files with one or more of these columns: Name, Archived, Description, Parent IDs, Parent names, Crop/variety, Season.
  2. No columns are unique. The import will only create new items, no items will be updated.

Regarding point (1) above: That list of columns omits some columns that are important to me -e.g. Flag and Description- for reasons i don’t understand, and one other -i.e. ID- for reasons i DO understand (i.e. could break referential integrity of the database, if misused), but experience as nonetheless problematic, for reasons related to the following.

Point (2) above is most problematic for me, because it results in duplicate records (w/r/t all attributes except ID) whenever my upload includes a change, as it must whenever some aspect of a Planting harvested is different from what was originally planned (typically, when farmer winds up planting it in a different bed).

We can avoid the potential problem of Referential Integrity by downloading the current list of Planting Assets, which does include the ID column, leaving that value intact for all records we want to edit, and leaving that field blank on all the new records we want to append; such update transactions are permitted via the API, i understand… But why not via CSV as well? Presuming that the backend RDBMS features declarative RI (Postgres certainly does), it could be handled in the standard way -i.e. just decline any illegal transactions and return a list of those exceptions, while processing all the rest. Could it not work that way?

3 Likes

Great thoughts @walt ! I just wanted to quickly point to this related issue for reference too: Update records via CSV importers [#2968909] | Drupal.org - It seems that we will be exploring that in the 2.x importers!

1 Like

Thanks @mstenta -this sounds good to me, i.e.:

Only -ignorant as i am of the development process- i am feeling the need to have some idea, however rough, of how far in future this v2 CSV Import feature is likely to be. I know there’s no way this can happen before v2 gets out of alpha and into beta at least… But if you could give us a date before which it can certainly NOT occur, that could help to inform some practical decisions out here in the field.

For my part, in light of positive developments on the API front, i need to weigh the cost of investing energies in that track, vs opportunity cost of delaying migration of our operational data into farmOS -latter of which factors escalates in proportion to length of delay, so… To whatever extent you can give us a reliable “not before x” date, that would be much appreciated!

1 Like

I don’t believe the default CSV importer will be able to create an asset, create harvest logs and create multiple quantities in a single upload. But it could likely be done in multiple uploads.

To do this in a single step it would require a custom Drupal module & PHP code to parse such a CSV, or a script to process the CSV externally and upload via API.

2 Likes

@walt Yea I think the custom Python CSV importer track might be most fruitful in the near term (next few months at least) - and maybe even better for you longer term as well. As @paul121 said, the initial v2 CSV importers will be pretty simple, like the v1 importers, and will focus first on importing individual types of records. So to import crop plans (including assets AND logs) would require multiple CSVs (as it does in v1).

But if you take the custom python script route: a) you can build exactly what you need in your own timeframe, b) the API is solid and will not be changing, so what you build will be durable, c) it will be more capable than the “simple” CSV importers we’ll provide initially, d) perhaps it could be used to populate actual “Crop PLANS” when that module comes together, e) it could be useful to others who have similar needs, if you want to share what you build!

Ah, OK @mstenta : if we’re talking at least a few months for such simple CSV importers, that helps me to rope expectations back in line. Moreover, as you say:

-i’m getting the distinct impression that the need to maintain important relations between different entities in the system (e.g. Assets and Logs) is pretty much inconsistent with the idea of using farmOS as a store of operational data that is maintained principally via CSV Import. Does that assertion ring essentially true, would you say?

I am still in process of evaluating feasibility of developing our own custom API scripts… But, as i have no intention of getting into proprietary software development, you can count on me to share whatever i might develop.

For that to be of any practical use to anyone else, of course, it would sure help to hear from other users in this community that are using CSV Import feature -or perhaps DID use, and gave it up for whatever reason. That was actually my purpose in posting this as a Topic; hopefully it will engage some interest.

CSVs are inherently limited, because they can only represent two dimensions (columns and rows). farmOS represents things in a more complex way, with many-to-one relationships. That said, it is still possible to design CSV importers/exporters that can go back and forth - within those limitations. And in many cases that’s all you need!

For example, a Plant asset in farmOS can have multiple seedings, transplantings, harvests, etc. But perhaps 99% of the time you only need 1 of each. Therefore you can create a CSV that has a column for “seeding date”, “transplanting date”, etc, and that can work just fine. It’s only when you need more complicated relations that CSVs begin to fall short (or need to be represented with multiple CSVs).

So, yes I agree: CSVs are limited in what they can represent. But they can still serve a lot of use-cases quite well.

such simple CSV importers

The goal of this first pass at CSV importers is to match what we have in 1.x, which are pretty simple: each record type has it’s own flat CSV importer.

However, the new system we’re building upon (Drupal’s Migrate API) gives us some new powers to explore and experiment with! So I’m hopeful that, after the first pass, we can also begin to explore more complex CSV templates, which can assist in creating multiple record types at once perhaps.

That said, the APIs will still always provide the most flexibility, because you can work with the raw records in whatever way you need. With the CSV importers, we basically need to write “plugins” for more complex processing of CSV column data. So some things come “for free” and others require development to enable them.

Going back to the Plant+Seeding example above, it’s “easy” to have a Plant CSV importer, which takes a Plant name and Plant type. And it’s “easy” to have a Seeding CSV importer, which takes a seeding date and a Plant asset name/ID. These two are “easy” because they write directly to the record being created, so the Drupal Migrate API gives us that “out of the box”. However, in order to have a single CSV importer for creating Plant assets AND a Seeding log, we need to write a plugin that takes a seeding date (and maybe other columns too, for seeding quantity, location, etc) and uses them to generate a Seeding log, which can then be associated with the Plant asset that is also being created at the same time. This goes above and beyond what Drupal’s Migrate API can do on its own, and requires us to write plugins, which also means we need to define exactly how they should work, which columns they require, etc.

Perhaps we can spend some time on one of the next development calls looking at the underlying considerations to provide a deeper understanding of how these things work behind the scenes. I am excited to explore adding more and more of these features over time! But I expect these enhancements to be a slow trickle of new developments over the coming months and years (unless we get some dedicated funding JUST to focus on CSV importers). That’s why I suggest Python + API calls in the near term to solve your immediate problems. :slight_smile:

you can count on me to share whatever i might develop.

:raised_hands:

2 Likes

Importers for animal register, will be an important requirement for me.
I was playing around with farmOS 1.x for the past month or so, unfortunately I didn’t realise how close 2.x was before I’d started.

I’m based in Ireland and all cattle here must be registered with the Department of Agriculture’s online register. I can export my herd profile from the Departments register as CSV and was able to quickly add my Herd to farmOS 1.x by renaming a few headers. I started dabbling with 2.0 this week and obviously there is no CSV importer, which at first I thought how could you be pushing 2.0 already without such a basic function, until that is I started actually looking at the documentation on farmOS.org, farmOS.py will hopefully offer what I want and more (when I get my head around it)

With 1.x I was planning on writing a script to reformat the departments CSV file to suit farmOS and exclude animals already imported to avoid duplicate records but now with farmOS.py I hopefully can put together something a little more integrated. I’ve a bit more reading to do first though.

3 Likes

That’s great @Farmer-Ed ! Please do share anything you end up hacking together!

I think we’ll see a lot more specialized tooling for common use cases like that. It would be great to have a collection of Python scripts for different needs (eg: one specifically formatted for the Dept of Ag’s CSV export format where you are).

2 Likes

Hi @walt
l cannot speak of experience as I’m starting to prepare for a pilot of FarmOS for our farm (and will be difficult to really start until the offline FieldKit App is ready, as connection is an issue where we are), but I hope this may be of interest to you

Unfortunately I cannot code, and I’m not sure the future Quick Form function will enable users to build their own custom Quick Forms (will it?).
As we’ll have some cacao-specific post-harvest processing to log, I was thinking that maybe I could use an open-source mobile data collection solution such as Kobo Toolbox to build my own forms, for more straightforward use by the field team, and then export the logs into FarmOS through CSV.

I’ve no idea if that’s feasible without too much processing (and explanations from @mstenta make me a bit worried, but I still think that may be possible?)
Of course it would be way better to do it all in FarmOS, but I was thinking of that as a potential workaround(?) and thought this might be of interest to you

Also, solutions like Kobo Toolbox give the opportunity to work with QR codes, so I was thinking that maybe it could maybe also help us track batches from plots all the way through processing to bags of dry beans??
Actually, I’m now wondering if anybody as been using Kobo Toolbox (or ODK) in conjunction with FarmOS??

3 Likes

Interesting UseCase you have there, @brunocacao. I am no developer either, and can’t answer your Q about QuickForms, but i can tell you that i have uploaded CSV sets of records pertaining to field observations in v1.x of farmOS (still my “production” instance); this is the UseCase that motivated my FieldKit trials, but that is on hold with me too, as i am likewise waiting for a version that syncs with farmOS v2.0 (soon to be my production instance) in off/online mode.

NB: Until CSV importers for v2.x are released, the only way to get offline-collected data into a v2.x farmOS instance is via the API script, if i am understanding the situation correctly.

Meanwhile: i am indeed interested in your post-harvest processing UseCase, as it seems relevant to some UseCases we are also trying to address on my farm, where we also need to log some post (and pre) harvest processes. For me, such records can all be classified as “Observations,” attributes of which need to include date/time stamp, geolocated photo(s) and QR or UPC code scan. Can that Kobo Toolbox support forms that include all these datatypes, can you say?

NB: I’ve not used Kobo, but i have used GlideApps to build & deploy apps for field data collection, tho w/o the code-scanning & geolocation features, but maybe that is also possible -don’t know, never tried. I can say that Glide really. lives up to its low (even no) code promise… But it’s a commercial product (“freemium” -but you can really do a lot at the free subscription level), so Kobe seems a better choice from the FOSS perspective.

2 Likes

Just last night I started working on a CSV importer for my Herd using Node Red.

It is actually looks surprisingly straight forward to import CSV data using Node Red as there is a csv to json node available., I’ve still a little to do and test later.


It’s not an no code solution but when I’m finished only a very small amount of Java Script in the function node will need to be changed to create different CSV importers.

function node code

//Declare Variables
let outputMsgs = [];
var tag,gender,dob,breed;
var asset

//Start loop to Read CSV one line at a time
msg.payload.forEach(data => {

//extract data from CSV
tag = data["Tag Number"];
gender = data.Gender
dob = data["Date of Birth"]
breed= data.Breed;

//Start Date formatting
function padZero(i) {
    return i < 10 ? "0"+i : i;
}
const dateString = dob;
const parts = dateString.split("/");
const date = new Date(Number(parts[2]), Number(parts[1]) - 1, Number(parts[0]));
const newDateString = date.getFullYear()+'-' + (padZero(date.getMonth()+1)) + '-'+padZero(date.getDate());
//End Date formatting

//Create Asset Object 
asset = {
  "data": {
    "type": "asset--animal",
    "attributes": {
      "name": tag.slice(-3), //name is last 3 digits of tag number 
      "data": "CSV import", //optional Hidden data field
      "sex": gender.charAt(0), //reduce Male/Female to M/F
      "birthdate": newDateString + "T00:00:00+00:00", //Formatted Date @ midnight
	  "tag": [ 
		tag //ID Tag in array
	],
    }
  }
}
outputMsgs.push({payload:asset}); //push Assets to next node one at a time
    }) //End Loop


return [ outputMsgs ];

This is not a complete flow, there are a few other nodes for authentication and to post the data to farmOS, but they will be the same as ones I’ve already used in other flows. I can share the complete flow later if it is any use to you.

I haven’t used tools like Kobo either but will definitely be giving them a look now.

I’ll be saving all my node red flows here: GitHub - Farmer-Eds-Shed/FarmOS-Node-Red-Flows

Also discussed a bit here: farmOS, NodeRed, Home Assistant etc - #21 by Farmer-Ed

4 Likes

Wow: for a farmer, @Farmer-Ed , it seems you sling code as well as manure; good on ya, mate!

/me: not so much… But i’ll be following this project with interest, as it looks like a clever approach to the problem!

3 Likes

Maybe its just the isolation due to Covid, but that is the best compliment I’ve received in a while! :grin:

2 Likes

@mstenta I’m having trouble creating a new Animal Asset using the API. Is there a minimum number of fields that need to be filled? or other difference compared to posting logs? Or something stupid that I’m overlooking?

I’ve tried posting the following json object to /api/asset/animal

{
	"data": {
		"type": "asset--animal",
		"attributes": {
			"name": "234",
			"status": "active",
			"id_tag": [
				{
					"id": "1234Test1234",
					"type": "ear_tag",
					"location": ""
				}
			],
			"birthdate": "2017-03-16T00:00:00+00:00",
			"sex": "F"
		},
		"relationships": {
			"asset_type": {
				"data": {
					"type": "asset_type--asset_type",
					"id": UUID,
				}
			}
		}
	}
}

But I’m just getting 422 status code returned.

Posting the following to /api/log/observation creates a log with no issue return code is 201

{
  "data": {
    "type": "log--observation",
    "attributes": {
      "name": "Due Calving " + cowid,
      "data": "Google",
      "notes": notes,
      "timestamp": newDateString + "T00:00:00+00:00",
	  "flag": [
		"review"
	],
    },
      "relationships": {
      "asset": {
        "data": [ 
        {
			"type": "asset--animal",
			"id": UUID,
        }
        ],
      }
}
  }
}
1 Like

@Farmer-Ed the data in key in relationships.asset_type and relationships.asset between those two examples are slightly different. The first one has data as an object: data: { ... } but I believe that needs to be an array: data: [ {...} ].

If you’re able to inspect the 422 response it should give you a bit more detail about what is wrong.

2 Likes

Thanks @paul121, browsing through the API with Firefox the animal_type is not in an array in relationships for asset--animal however the relationship in the log is in an array.

This is from the logs

	Drupal\jsonapi\Exception\UnprocessableHttpEntityException: Unprocessable Entity: validation failed. in Drupal\jsonapi\Controller\EntityResource::validate() (line 59 of /opt/drupal/web/core/modules/jsonapi/src/Entity/EntityValidationTrait.php).
    if (count($violations) > 0) {
      // Instead of returning a generic 400 response we use the more specific
      // 422 Unprocessable Entity code from RFC 4918. That way clients can
      // distinguish between general syntax errors in bad serializations (code
      // 400) and semantic errors in well-formed requests (code 422).
      // @see \Drupal\jsonapi\Normalizer\UnprocessableHttpEntityExceptionNormalizer
      $exception = new UnprocessableHttpEntityException();
      $exception->setViolations($violations);
      throw $exception;.

Do you know if there is a minimum number of required fields to create an Asset?
The following returns the same error: Which I read as the syntax is fine but it doesn’t like the data

{
	"data": {
		"type": "asset--animal",
		"attributes": {
			"name": "234",
			"status": "active",
			"birthdate": "2017-03-16T00:00:00+00:00",
		}
}
}
1 Like

Ahh…
I’m an idiot, just found how to read the responses

In my wisdom having read the documentation on the HTTP Response Node I connected a debug node to listen for msg.statusCode which returns the status code number. If I’d just left it as the standard msg.payload it actually returns a fairly detailed description of why its not working.

And the issue to start with, was I copy and pasted the wrong relationship asset_type--asset_type instead of taxonomy_term--animal_type

2 Likes