Example farmOS.py CSV import script

On the last monthly call there was some interest in importing data from CSV/spreadsheets into farmOS - ping @walt @Kamitor @varunity ?

I hope this example can help serve as a starting point for other custom CSV importers using Python + farmOS.py. Code + installation and usage instructions are here: farmOS.py 2.x Import example · GitHub

This example reads a name, crop, variety and date from a CSV and creates a Plant asset with a seeding log on the specified date. The Plant asset’s plant_type is either loaded from an existing term, or a new plant_type will be created with a reference to the crop_family term (which is also created if one does not already exist). I think this incorporates both simple and more complicated concepts that will be run into when creating custom importers with Python:

  • Python
    • Command line arguments
    • Logging
    • CSV file loading + parsing
    • Date parsing
  • farmOS.py
    • Client instantiation + authorization
    • Create an asset with a plant_type/taxonomy term relationship
    • Create a log with asset relationship
    • Request existing entity with query filters (example is of taxonomy terms filtered by name)

While farmOS does have CSV importers, they require a strict format and are limited in what fields are supported. Even a fairly simple script allows for many more options: flexibility with the CSV format, the ability to create additional fields, multiple entities + entity relationships (assets + logs referencing assets), conditional logic, etc… but they also require customization & some programming knowledge!

Happy to answer any questions & would love to hear ideas for improvements!

2 Likes

Hey @paul121 : Excellent initiative! Being a near-total python n00b, i have my technical issues to sort out (1st related question posted to project repo), but at the higher business-case level, this sounds like a solid first step toward solving my big challenge of migrating my farm operational data from spreadsheets into farmOS.

So: my 1st hi-level question is: Could this script be modified and maintained with reasonable ease to work with v1.x API as well as v2?

I ask because, in light upcoming v2 release of farmOS, and these two related advisories @mstenta gave on last night’s Dev call- (1) to consider all data put into v2 Alfa instances as throwaway (it WILL be overwritten with each successive release); and (2) to not waste development effort against the v1.x API- i find myself in a bit of a quandry about strategy.

To explain: i am investing resources right now in this data migration challenge- by that i mean not farmOS v1.x to v2 migration, but rather from my existing spreadsheet-based model to farmOS- and for this project to be anything other than a bust, it needs to yield business results (i.e. farmOS as a useful tool in production, not just a R&D /playtime project). So: we need to be turning our existing spreadsheets into a CSV format that v1.x can understand now… But then the process of posting and updating new records is something i would really prefer to accomplish via API script.

SO: If you were in my position- i.e. a n00b w/o any in-house programming resources -i wonder how you might resolve this quandry? I guess the answer depends largely on how long it will be until farmOS comes out of alpha testing and into stable beta release (i know it’s bad form to ask this of FOSS developers, but even a “90% certain it will be before xyz date” would really help me in scoping this issue!).

2 Likes

Good question @walt. I think this script could be modified to work with 1.x pretty easily.

The 1.x client methods and structure of the data payloads and responses would look a little different, but the script would be structured the same. It makes sense that you will need this for your situation! I’ll try and draft up a 1.x version over the weekend.

Once you turn existing spreadsheets into a CSV format for 1.x, I think the same CSV should work with a 2.x server. This is because the farmOS data model is largely the same, it’s just the API changes that are different. So the structure of your script will largely stay the same as well, but farmOS.py client methods will change & return data in a slightly different format.

The mechanics of the 2.x API should be pretty stable, it’s just the farmOS data schemas that may change some before a beta release. So you should be fine to start developing scripts that work with a 2.x instance.

But yes, since you are hosted on Farmier and need to interact with your “canonical” database right now… my advice would be to start working on converting existing spreadsheets to a format that’s optimized for importing to farmOS. Then you could build a script for 1.x, and later on converting it to 2.x should be fairly easy. (Assuming this is not a one time thing and you’ll want to import/sync data in a similar format later on!)

2 Likes

If you can do that, @paul121 , that will be awesome!
Meanwhile, i will transform our spreadsheets into CSVs that farmOS 1.x can readily accept.
/w

1 Like

@walt I added an example import_plantings_1x.py file to the gist.

This led to me discovering a bug in the farmOS.py 1.0 beta that did not have the correct header for 1.x servers which prevented creation of records. That is fixed in a new 1.0.0b3 release - so you will need to install the latest version to create records on your 1.x server!

1 Like

So @paul121 : Only took us >2 months (busy time on farm!), but we finally have our harvest worksheets rendered into form that should be uploadable into farmOS, if i’m understanding this right:

OK… So i’ve downloaded the list of plantings from my v1.x instance, and tried to run your import_plantings.py script against my v2 instance, but it fails at run time on line 37 where you define the function: _get_plant_asset_plant_type(crop_name, plant_type_name) , returning the error message: SyntaxError: unexpected EOF while parsing.

Absent any idea what that might mean, i figured i’d better check to ensure compatibility of the schema on v1.x plantings vs v2 -column headers of which are as follows:

  • v1.x: Farm asset Asset ID Name Crop/variety Season Description Flags Group Location Archived
  • v2: Bulk update image_target_id ID Asset name Asset type Flags Parents Group Location Status

As you can see, the v2 schema downloaded from …/assets/plant page is designed to hold records of any “Asset type” (which indeed it does in my instance), while the v1.x schema is designed to hold only records of the “Plant(ing”, v1.x-speak) type -so if this were to cause an error on upload, i should not be surprised.

More generally, i have to ask at this point: is it reasonable to assume that we can download a list of our plantings from farmOS, edit records as appropriate offline (so long as we keep the same “Asset ID”, the names may change if we plant in a different bed than originally planned), append new records as appropriate (leaving Asset ID field blank for assignment by farmOS), and then upload the lot via this script of yours, such that all records are created OR updated as appropriate, without any duplication of data?

If this is a process that could benefit from debugging in real time, let me know -i’d love to get this sorted ASAP, so i’m available, anytime!- but if there’s a more fundamental problem of misaligned expectations on my part as to what is possible here… Well, that would also be good to know ASAP, so i can be thinking about some other way to skin this cat (e.g. a more agile front end for entering harvest data directly via API?)

/walt

1 Like

Hmmm… I think this means there is a syntax error somewhere before line #37. An un-closed string could cause this issue - is it possible you modified one of the lines/variables above that (like the hostname or username) and have extra and/or missing parenthesis? The original file runs OK for me.

So this just depends how you design your import script! You could design the script to be hard coded for a specific asset type, read the asset type from a column (like the current 2.x CSV export has), or even take the asset type as an argument to the script.

The import_plantings.py example is quite simple - it only reads a few columns from the CSV file and the asset type is hard coded to be plant: farmOS.py 2.x Import example · GitHub

Not with this example script - but with a script it would be possible, yes! That said, I think that ideally the use case you are describing could be covered with the core farmOS CSV importers UI - we are working on this here: https://www.drupal.org/project/farm/issues/3151244

But it also seems like it could be useful to provide some farmOS.py/farmOS.js import scripts that would be compatible with the format of the current CSV exports. This script could do what you are describing, update existing OR create new. Although I think the utility of such a script wouldn’t match the fully functioning CSV importer we are building.

Just to be clear - I imagine you’ll need to create custom script(s) for importing these. The CSV importer UI would likely work for a subset of the fields you are importing but the more complicated pieces might require a script (things like quantities, entity references, special logic, etc). A hybrid approach might work too! It really depends on the format of your worksheets and what all you are trying to do. I think it would be worthwhile for you to work on a custom script right now.

Maybe you could open a new thread for specific questions regarding these worksheets? An example would help too!

2 Likes

Thanks, @paul121, for the quick & helpful response. Skipping first to the bottom line:

-i accept the challenge, have begun process of scripting my own CSV Importer (pseudocode only at this point), and will initiate a new thread on that topic directly.
PS: Done -see this thread in farmOS Users. /w

As a first step tho, i want to make sure i understand this little import script example you have provided, and can use it for its intended purpose. Do i understand correctly that this script- modified only for access to my v2 instance at farmier host- should enable upload of this little plantings.csv table you have set up as an example?

NB: considering the content of this table:

name,crop,variety,date
Big Corn planing,corn,big,5/5/2021
Pumpkins,squash,pumpkin,8/5/2021
Green beans #1,beans,green beans,6/5/2021
Green beans #2 (future),beans,green beans,10/5/2021

-i still don’t understand how the import can work, since schema of that little table is not the same as schema in my farmOS instance (referenced in my previous msg above), but perhaps farmOS/API is somehow able to interpret those column names & map to schema as appropriate (?)

Notwithstanding that little detail: If i am correct in the more general assumption above, then there is still some problem(s) here, although i did pass the line 37 hitch mentioned above (something in the client authorization script, not sure what, but i fixed it by copying in the auth codes from my farmos.py login to API, which works fine). The two sections of code that are failing in my case are:

  1. File command line args in lines 17-19: thinking that maybe this block is superfluous (since i am not running script at the command line), i skipped over this block, ran the function def in lines 37-91 just fine, but then…
  2. Something in the CSV processing is not working right; traceback appended below the fold.

Is there anything you can suggest to get this working as it should?

/walt

8<------(snip)------>8

%tb
Traceback (most recent call last):

  File "/var/folders/hc/46157fbs5t1cscm8c38y3fd80000gn/T/ipykernel_98970/2364323154.py", line 1, in <module>
    runfile('/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS/import_plantings.py', wdir='/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS')

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/debugpy/_vendored/pydevd/_pydev_bundle/pydev_umd.py", line 167, in runfile
    execfile(filename, namespace)

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/debugpy/_vendored/pydevd/_pydev_imps/_pydev_execfile.py", line 25, in execfile
    exec(compile(contents + "\n", file, 'exec'), glob, loc)

  File "/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS/import_plantings.py", line 25, in <module>
    args = parser.parse_args()

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/argparse.py", line 1768, in parse_args
    args, argv = self.parse_known_args(args, namespace)

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/argparse.py", line 1800, in parse_known_args
    namespace, args = self._parse_known_args(args, namespace)

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/argparse.py", line 2034, in _parse_known_args
    self.error(_('the following arguments are required: %s') %

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/argparse.py", line 2521, in error
    self.exit(2, _('%(prog)s: error: %(message)s\n') % args)

  File "/Users/walterludwick/opt/anaconda3/lib/python3.8/argparse.py", line 2508, in exit
    _sys.exit(status)

SystemExit: 2
1 Like

PS to the following:

SOLVED: as follows:

  1. Going on presumption that this is not needed, i commented-out lines 17-19, proceeding to…
  2. Applying the debug feature of Spyder IDE (a new one on me), i saw that those three instances of ‘hostname’ in line 154 were undefined -so i changed all three instances to “client” (as defined upstream in the script)… Whereupon the script runs w/o a hitch.

SO: checking back into my v2 instance at Farmier i can now see that 4 new items have been added to the “Plant assets” list -specifically:

213 [Green beans #2 (future)](https://vdl.farmos.dev/asset/213) green beans Active
212 [Green beans #1](https://vdl.farmos.dev/asset/212) green beans Active
211 [Pumpkins](https://vdl.farmos.dev/asset/211) pumpkin Active
210 [Big Corn planing](https://vdl.farmos.dev/asset/210) big

Cool! Dunno what happened to the ‘date’ values, nor how it is that ‘crop’ and ‘variety’ got concatenated into one Crop/Variety field (?), but… Hey: minor details of schema aside, the script delivers as promised, so thanks a heap @paul121 for this!

/walt

2 Likes

Amazing! In my excitement about uploading Plant Assets via API, i failed to notice (what i boldfaced above) those related Seeding logs (which are not so clear to see from the Asset detail view as in v1.x; in v2, you must drill-down to see them). Am still noodling over this code to understand how best to apply this magic -which leads me to:

Dunno if this is an improvement idea, or just me failing to see what should be obvious, but: Generally speaking, start dates can pertain to either a seeding OR a transplanting. On my farm, trees are all transplants as a rule, while annual plants can be either transplants (almost always the case w/r/t/ our row crops) or else direct-sown (i.e. field crops). This is a nuance that i’d like to introduce into this script, tho i’m not sure exactly how; if you can shed any light @paul121, i’d be much obliged!

The other thing i’m wondering about is: where does that date specified in the CSV get stored? Nice that the one w/ date later than today- Green Beans #2 (Future) -get assigned status of “Pending,” while the others are assigned “Done” status… But then the only date i can find anywhere in the UI is the Timestamp assigned to all records when uploaded; can’t find that date specified in CSV anywhere in UI of farmOS v2. Am i missing something here?

PS: I also can’t see in the UI where the “crop” attribute is hiding (failed to notice at first, since it’s embedded in the “Name” field of test CSV); what shows in the “Crop/variety” column of UI is just the ‘variety’ attribute -which appears with a 3-digit numeric code in parentheses (i.e. “green beans (193)”,“pumpkin(291)”,“big(289)”) concatenated on the edit screen. Where does that numeric code come from, i wonder, since it is not in the CSV table, and not in database of my farmOS instance (AFAIK)?

1 Like

Sure! So instead of creating a seeding log you need to modify the script to make a transplanting log. That would be simple, but it sounds like you want to do this conditionally based on the crop family and/or variety (depending how you model this in your farmOS). Something like this might be sufficient:

# Use log_type variable in place of hard coded 'seeding' log type.
log_type = 'transplanting' if crop == 'tree' else 'seeding'

The example only uses the date for the seeding log timestamp and will only be seen when you export seeding longs. You could specify the asset created attribute to have the same date as well.

That 3-digit number is the ID of the plant_type taxonomy term. Every plant asset must have a plant_type, and each plant_type term can optionally be specified a family from the crop_family vocabulary. You can see these plant types at hostname/admin/structure/taxonomy/plant_type

2 Likes

Thanks @paul121 for this more nuanced explanation -especially:

Sounds good, but… I see an issue here, in that my data (what’s in v1.x production instance, and v2 as well, that holding just ported-over data) does not have separate fields for ‘crop’ and ‘variety’; instead, there is one column for “Crop/variety”, which i have populated with values like “Apples Anna”, “Apples Fuji” (both trees, obviously, tho that is implicit) and veg products like “Tomatoes Benache” and “Carrots winter” -again, plant types being implicit, not declared anywhere that i can see. Moreover: your script has stuff related to entities & attributes, e.g.- "type": "taxonomy_term--crop_family" -what my database knows nothing about.

Still: can i just insert that line of code somewhere (where seeding gets created between lines 136 and 152: in there somewhere i guess <8-) i wonder? And then it will just instantiate those entity-attributes i’ve not even used thus far? Sounds too good to be true, but no harm in trying i guess!

/walt

1 Like

Sorry the naming on that is a little confusing, but I tried to explain this in my first post. This script is using separate plant_type and crop_family taxonomies. Plant assets have as single Crop/Variety field (machine name plant_type) that references a plant_type term. Plant types can be assigned to a crop_family.

Using these separate taxonomies is completely optional, but useful when a crop_family might have different attributes or behaviors than a plant_type. This distinction adds some complexity but makes it easier to program this kind of logic like you have described, seeding vs transplanting.

I should also point out this issue:https://www.drupal.org/project/farm/issues/3191115

Nope, that won’t work. Fields (called “attributes” and “relationships” in JSONAPI) can only be added with Drupal modules.

2 Likes

Thanks for the explainer, @paul121, both here an on last Thu pm Devs call; i think i understand about this, and will probably steer clear of taxonomies, once i start uploading production data.

In fact i have just done my first upload of a production data set into v2a2, using your script faithfully (i.e. following your plantings.csv schema of “name,crop,variety,date” exactly, with no other properties added); script appears to have run w/o a hitch for the first 48 records (of 148 in the target CSV file), but then it choked on or after the 49th (ID=288 -last record to come thru OK in my instance) for reasons i can’t quite figure out from the traceback appended below.

Can you perhaps shed some light on this?

8<-----(snip)----->8

INFO:root:Imported planting asset: 2021 Beetroot week16 HVW3 - <farmOS.farmOS object at 0x1295eff40>/asset/287
INFO:root:Imported planting asset: 2021 Tomatos varied week33 HVW3-N - <farmOS.farmOS object at 0x1295eff40>/asset/288
Traceback (most recent call last):

File “/var/folders/hc/46157fbs5t1cscm8c38y3fd80000gn/T/ipykernel_86762/2364323154.py”, line 1, in
runfile(’/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS/import_plantings.py’, wdir=’/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS’)

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/debugpy/_vendored/pydevd/_pydev_bundle/pydev_umd.py”, line 167, in runfile
execfile(filename, namespace)

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/debugpy/_vendored/pydevd/_pydev_imps/_pydev_execfile.py”, line 25, in execfile
exec(compile(contents + “\n”, file, ‘exec’), glob, loc)

File “/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS/import_plantings.py”, line 117, in
plant_type_id = _get_plant_asset_plant_type(crop_name, variety_name)

File “/Users/walterludwick/Library/Mobile Documents/com~apple~CloudDocs/WLcode/py_farmOS/import_plantings.py”, line 100, in _get_plant_asset_plant_type
new_plant_type = client.term.send(‘plant_type’, plant_type)

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/farmOS/client_2.py”, line 150, in send
return self.resource_api.send(

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/farmOS/client_2.py”, line 81, in send
response = self.session.http_request(

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/farmOS/session.py”, line 109, in http_request
return self._http_request(

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/farmOS/session.py”, line 165, in _http_request
response.raise_for_status()

File “/Users/walterludwick/opt/anaconda3/lib/python3.8/site-packages/requests/models.py”, line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)

HTTPError: 422 Client Error: Unprocessable Entity for url: https://vdl.farmos.dev/api/taxonomy_term/plant_type

In [2]:
8<-----(snip)----->8

… And, while i’m still trying to figure out what caused the aforementioned failure, I would also like to know if v2 will accept an import (via API) of all those attributes in the “download.csv” template, i.e:
Bulk update,image_target_id,ID,Asset name,Asset type,Flags,Parents,Group,Location,Status
Yes?

Moreover: i need to know if- presuming my import contains Asset records whose Name and ID match existing assets in the db -i can PUT those records to overwrite their match in the db, instead of just appending them as duplicates (as would happen in a v1x Import operation) ? I think you said so, @paul121 -unless i heard it from @mstenta -but in any case, i want to make sure i’ve understood this right -yes?

@walt Yes the API can accept everything that the UI can accept. CSV importers/exporters are more limited in what they can do.

That said, many of the columns listed actually require separate API requests in addition to the asset being imported… (and the names are different than the API names in some cases).

Bulk update,image_target_id,ID,Asset name,Asset type,Flags,Parents,Group,Location,Status

Asset name, Asset type, Flags, Parents, and Status can all be set when importing an asset.

image_target_id is a reference to a File entity - so you would need to make a separate request to create the File entity before you can reference it on the asset.

Group and Location are special - because these are pseudo-properties that are calculated based on logs that reference the asset. You cannot set an assets “Group” or “Location” directly on the asset record via API. Instead, you must create a log that references the asset and sets the is_movement or is_group_membership boolean value on the log. The log must also be marked as “done” and have a date <= present for it to show as the asset’s current Group/Location.

(And ignore Bulk update - that’s just the checkbox for performing bulk actions on records - probably shouldn’t be included in the CSV export…)

Thanks @mstenta, all good to know… But i’m still in doubt about the ID column (i.e. PK of the asset), and if that can be used during Import process to determine whether to PUT an overwrite on preexisting asset, or rather to POST a new record. ?

1 Like

Sure! In your Python script, you can vary your request type based on the presence of the ID column in the CSV you are importing.

If you know the asset type, you can also perform a GET request to see if an asset with a given ID exists. For example, to check whether or not an Animal asset with ID 16 exists, you can check /api/asset/animal?filter[drupal_internal__id]=16. The data array in the response will either be empty, or will contain a single asset record.

As for the future CSV importers, here is our open feature request for “updating existing records”: Update records via CSV importers [#2968909] | Drupal.org

I think the only reason validation would fail when creating a taxonomy term is because the name field is empty. What is the variety for the 49th record in your CSV? Maybe it is missing or has some errant commas around that column?

If you have a debugger you can actually check the content of that 422 response and it should include what exactly the validation error is. You could also modify the script to catch the error and print the response.

Ahhh… Here is insight:

You hit the nail on the head, @paul121 : tho record 49 was OK, this record #50 is the dud:
2020 Swiss chard Autumn HVW4-N,Swiss chard Autumn,,12/28/2020
Where the comma after Crop “Swiss chard” got skipped until after Variety “Autumn” -which resulted in a a double comma where Variety should have been.

I guess i must do a better job of validating CSV’s before running the script on them… And it would be nice if there were a way to “undo” (rollback, in RDBMS terms) a bulk update process like this that somehow goes wrong. Might this be somehow enabled by that “Bulk update” checkbox attribute @mstenta advised me above to ignore, i wonder?

2 Likes