In the course of my stumbling around yesterday, related to this CSV upload challenge, i came to a deeper realisation of what you wrote yesterday @mstenta, i.e.:
Just be aware of the way “file storage” works with JupyterLite, so you don’t lose anything! All files you upload or create in JupyterLite are stored IN YOUR BROWSER SESSION … they are NOT stored on the farmOS server. If you clear your browser cache or change browsers, they are gone!
Yes; in fact, to debug my problem, i had to switch from Chrome to Firefox, and of course the subject file had to be re-uploaded… Which put me to wonder about a coupla things, in context of this UseCase:
A primary benefit of this application architecture is having the freedom to work from any machine- e.g. farm office and/or home -but, given the diffs that will inevitably arise in state across those two machines, how can we mitigate the confusion that will consequently arise?
Memory management: This little (26kb) CSV is small enough to be of no concern… But this being a workflow we plan to run at least weekly- and sometimes on much larger files -what might be the negative impact on browser/system performance, and what should be done to mitigate that problem?
Given these (and other?) limitations, a JupyterLite NB that references files is not suitable for sharing a replicable result -whether in the interest of tech support (as @Symbioquine and i experienced yesterday) or in the larger context of Replicable Data Science.
Obviously i don’t understand this technology enough… So i did a little digging yesterday, from which i gathered that localStorage is a subtype of Web Storage (f.k.a. DOM Storage), along with two other forms that are more familiar to me. What complicates matters further is the different ways in which browser-makers implement the standard (that’s the point at which my head started to hurt, so i quit digging), but i did find this little table (pictured below) that helped me to understand essential similarities & diffs.
Bottom-line: There’s enough deep voodoo about this stuff that- to avoid sliding into even deeper doodoo! -i think it will be wise to store any files referenced in the JupyterLite NB in an online archive, and link them explicitly in the document.
Otherwise, I think you’re making some great points - most of which don’t have concrete answers.
I will say though that many of those issues are mitigated just by changing how we think of the “storage” in JupyterLite. If we consider the storage in JupyterLite like a sandbox environment or temporary work area, then we can treat those things as advantages.
I would argue that it is better for both the tech support and replicable data science scenarios.
I was able to start with a clean slate and bring in just the files I needed to try and reproduce the problem you were having.
I was able to modify the files without any risk losing data or breaking things for anyone else.
It forces me to be intentional about sharing just the versions/changes which are important.
Conversely, it helps ensure the collective workspace isn’t increasingly littered with semi-relevant experiments.
I would also argue that one of the foundations of truly replicable data science is going to be consistent and disciplined use of version control technologies like Git to manage the source code - and in some cases sample data. Perhaps in the future JupyterLite could help with that part, but in a way it’s kind of beautiful that it doesn’t. It’s job is just to be a place that’s reproducible (but not replicated) between users to run some scripts in a little more interactive environment than a text editor and a command line.
Thanks @Symbioquine for providing a more nuanced perspective. I can see how what i was inclined to view as bugs might be considered features… Just so long as we (a) treat it as a “sandbox,” and (b) employ that “more consistent and disciplined use of version control tech” for both sources and sample data.
Also: since you had me open Developer Tools yesterday (in search of that CROPS.csv file), and as that article you linked explains in more detail (illustrated by screenshot below), i can now easily navigate to where these files are stored (IndexedDB indeed, in both Chrome and Firefox), confirm the keys and drill down into values… But only by navigating the JSON tree, in which form my nice tidy tables has been rendered.
One day i hope to get over my allergy to these so-deeply-nested JSON trees; still, being more of a rows&columns kinda guy, i have to ask: is there any easy way to translate this JSON tree back into tabular .CSV form?
It’s an empty sandbox for the moment, as i don’t have more than this 1/4hr to play right now, but- thinking this may enable one of you dev dudes to create something wonderful -just thought i should nail it up here straightaway.
Let a thousand flowers bloom, in form of yourname.github.io/farmOS notebooks!
PS: “If it sounds too good to be true,” as the saying goes, “it probably is” -certainly the case here.
After a whole lot of diddling around in Github, i must admit defeat (for today at least): can’t find any way to upload files into the above-linked repo, in such a way that they will actually BE there for somebody coming in with another browser.
So all this affords me is another place to access a JupyterLite runtime -not a bad thing, but then it’s not what i was hoping it might provide. It is just me, i wonder, or is the facility indeed this limited?
Dooh! and here i was, trying everything i could think of in the “gh-pages” branch, thinking that’s the one that is made for publishing to username.github.io !
Yes! After many cache-flushes, eventual switch to a different browser, plus significant time elapsed, i now i see that my files uploaded to ‘main’ branch are indeed there (yay!).
NB: On return to my default browser (Firefox), it took me some time to make yesterday’s version of the .ipynb go away; it’s not as simple as just flushing the browser cache, but you have to go into the “Clear Recent History” dialog and tick ALL the boxes, including Data: Site Settings and Offine Website Data. Bit of a pain… Which is why i am glad to hear this:
Oh good point @walt ! You had the right idea, actually, and I had to double-check but I see how it works now… it looks like the repo has a GitHub Actions workflow that automatically publishes any changes from the main branch to the gh-pages branch for you!
it took me some time to make yesterday’s version of the .ipynb go away; it’s not as simple as just flushing the browser cache, but you have to go into the “Clear Recent History” dialog and tick ALL the boxes, including Data: Site Settings and Offine Website Data. Bit of a pain…
An HTTP 500 error is supposed to mean something when wrong on the server (farmOS in this case) side. Check the farmOS logs at /admin/reports/dblog on your farmOS instance. Hopefully that will give us a clue what went wrong.
If you’re comfortable doing so, can you open your browser dev tools’ network tab and right click on that request which is failing with the 500 error, then click “copy as cURL”. The resulting clipboard contents can be pasted into a terminal (mac or linux) and " -i" can be added after the “curl” (e.g. “curl -i …”) part of the command to see the response headers along with the response body. That may give us a clue why it is failing…
I haven’t been able reproduce this problem, but I’m happy to help troubleshoot further if you feel like opening another thread @Farmy. I’m guessing this doesn’t really have anything to do with JupyterLite per se.
Yeah, it’s hard to tell how far afield to let the topic get. I think it can be useful to engage a bit on issues here, but probably not to go super deep on them…