Share your JupyterLite Examples

In the course of my stumbling around yesterday, related to this CSV upload challenge, i came to a deeper realisation of what you wrote yesterday @mstenta, i.e.:

Just be aware of the way “file storage” works with JupyterLite, so you don’t lose anything! All files you upload or create in JupyterLite are stored IN YOUR BROWSER SESSION … they are NOT stored on the farmOS server. If you clear your browser cache or change browsers, they are gone!

Yes; in fact, to debug my problem, i had to switch from Chrome to Firefox, and of course the subject file had to be re-uploaded… Which put me to wonder about a coupla things, in context of this UseCase:

  1. A primary benefit of this application architecture is having the freedom to work from any machine- e.g. farm office and/or home -but, given the diffs that will inevitably arise in state across those two machines, how can we mitigate the confusion that will consequently arise?
  2. Memory management: This little (26kb) CSV is small enough to be of no concern… But this being a workflow we plan to run at least weekly- and sometimes on much larger files -what might be the negative impact on browser/system performance, and what should be done to mitigate that problem?
  3. Given these (and other?) limitations, a JupyterLite NB that references files is not suitable for sharing a replicable result -whether in the interest of tech support (as @Symbioquine and i experienced yesterday) or in the larger context of Replicable Data Science.

Obviously i don’t understand this technology enough… So i did a little digging yesterday, from which i gathered that localStorage is a subtype of Web Storage (f.k.a. DOM Storage), along with two other forms that are more familiar to me. What complicates matters further is the different ways in which browser-makers implement the standard (that’s the point at which my head started to hurt, so i quit digging), but i did find this little table (pictured below) that helped me to understand essential similarities & diffs.

Bottom-line: There’s enough deep voodoo about this stuff that- to avoid sliding into even deeper doodoo! -i think it will be wise to store any files referenced in the JupyterLite NB in an online archive, and link them explicitly in the document.

Screenshot 2022-02-20 at 11.01.30

1 Like

Hey @walt this article is a little better because it’s a little more up to date and includes IndexedDB, which is the storage API which JupyterLite will use in most browsers.

A more up-to-date table from that article.

Otherwise, I think you’re making some great points - most of which don’t have concrete answers.

I will say though that many of those issues are mitigated just by changing how we think of the “storage” in JupyterLite. If we consider the storage in JupyterLite like a sandbox environment or temporary work area, then we can treat those things as advantages.

I would argue that it is better for both the tech support and replicable data science scenarios.

  • I was able to start with a clean slate and bring in just the files I needed to try and reproduce the problem you were having.
  • I was able to modify the files without any risk losing data or breaking things for anyone else.
  • It forces me to be intentional about sharing just the versions/changes which are important.
    • Conversely, it helps ensure the collective workspace isn’t increasingly littered with semi-relevant experiments.

I would also argue that one of the foundations of truly replicable data science is going to be consistent and disciplined use of version control technologies like Git to manage the source code - and in some cases sample data. Perhaps in the future JupyterLite could help with that part, but in a way it’s kind of beautiful that it doesn’t. It’s job is just to be a place that’s reproducible (but not replicated) between users to run some scripts in a little more interactive environment than a text editor and a command line.

4 Likes

Thanks @Symbioquine for providing a more nuanced perspective. I can see how what i was inclined to view as bugs might be considered features… Just so long as we (a) treat it as a “sandbox,” and (b) employ that “more consistent and disciplined use of version control tech” for both sources and sample data.

Also: since you had me open Developer Tools yesterday (in search of that CROPS.csv file), and as that article you linked explains in more detail (illustrated by screenshot below), i can now easily navigate to where these files are stored (IndexedDB indeed, in both Chrome and Firefox), confirm the keys and drill down into values… But only by navigating the JSON tree, in which form my nice tidy tables has been rendered.

One day i hope to get over my allergy to these so-deeply-nested JSON trees; still, being more of a rows&columns kinda guy, i have to ask: is there any easy way to translate this JSON tree back into tabular .CSV form?

1 Like

I wouldn’t recommend using the developer tools like that except to troubleshoot things when you absolutely need to look behind the curtain.

For day-to-day stuff it’s probably best thing is probably to let JupyterLite do that work for you.

You can just right click on the file and choose “download”;

Or for viewing purposes you can also click “Open” to view the file in JupyterLite directly;

image


Behind the curtain

The data is stored as a JSON blob, which you’d need to parse and extract the value from the “content” field;

If you saved that value to a text file with a .csv extension, you’d have your original file back.

Unfortunately, it looks like the dev tools viewer in Firefox truncates the values before parsing the JSON so it breaks on files as large as your CROPS.csv;

That part does work in Chrome though;

3 Likes

Following this post of yesterday on Jupyter Blog, i was able to deploy that template to a new Github repo just by clicking a single button, and then deploy the jupyterlite runtime as a new Github Pages site in about 5 minutes.

It’s an empty sandbox for the moment, as i don’t have more than this 1/4hr to play right now, but- thinking this may enable one of you dev dudes to create something wonderful :grin: -just thought i should nail it up here straightaway.

Let a thousand flowers bloom, in form of yourname.github.io/farmOS notebooks!

1 Like

PS: “If it sounds too good to be true,” as the saying goes, “it probably is” -certainly the case here. :frowning:

After a whole lot of diddling around in Github, i must admit defeat (for today at least): can’t find any way to upload files into the above-linked repo, in such a way that they will actually BE there for somebody coming in with another browser.

So all this affords me is another place to access a JupyterLite runtime -not a bad thing, but then it’s not what i was hoping it might provide. It is just me, i wonder, or is the facility indeed this limited?

@walt I think that if you upload files to this directory then they will appear in the JupyterLite environment: farmOS/content at main · ludwa6/farmOS · GitHub

Although I found that JupyterLite does not automatically pick up changes to that directory - and I needed to completely clear the browser cache for the JupyterLite site for files to appear.

FWIW there are some ideas for making “file management” in general easier to use in the Drupal JupyterLite module here: [META] File management [#3264048] | Drupal.org

Dooh! and here i was, trying everything i could think of in the “gh-pages” branch, thinking that’s the one that is made for publishing to username.github.io !

Yes! After many cache-flushes, eventual switch to a different browser, plus significant time elapsed, i now i see that my files uploaded to ‘main’ branch are indeed there (yay!).
NB: On return to my default browser (Firefox), it took me some time to make yesterday’s version of the .ipynb go away; it’s not as simple as just flushing the browser cache, but you have to go into the “Clear Recent History” dialog and tick ALL the boxes, including Data: Site Settings and Offine Website Data. Bit of a pain… Which is why i am glad to hear this:

:+1:

Oh good point @walt ! You had the right idea, actually, and I had to double-check but I see how it works now… it looks like the repo has a GitHub Actions workflow that automatically publishes any changes from the main branch to the gh-pages branch for you!

That’s handy! :slight_smile:

it took me some time to make yesterday’s version of the .ipynb go away; it’s not as simple as just flushing the browser cache, but you have to go into the “Clear Recent History” dialog and tick ALL the boxes, including Data: Site Settings and Offine Website Data. Bit of a pain…

Yea agreed - it would be great if we could make this less painful by connecting directly to Drupal’s file storage mechanisms. It will take some thought though - especially on the browser storage clearing question. It seems that JupyterLite has some hard assumptions there… although this upstream thread @Symbioquine found gives me some hope that they are thinking about it: Normalize and make Content frontends and backends extensible · Issue #315 · jupyterlite/jupyterlite · GitHub

2 Likes

I tested the “Animal CSV import” example, but I got an error in block [9] in line:
resp = await pyfetch(location.origin + ‘/api/taxonomy_term/animal_type’, method=‘POST’,

OSError: Request for https://localhost/api/taxonomy_term/animal_type failed with status 500: 500 Service unavailable (with message)

Did I overlook an important detail?

An HTTP 500 error is supposed to mean something when wrong on the server (farmOS in this case) side. Check the farmOS logs at /admin/reports/dblog on your farmOS instance. Hopefully that will give us a clue what went wrong.

1 Like

This is the farmOS log:

Location https://localhost/api/taxonomy_term/animal_type
Referrer https://localhost/jupyterlite/lab/index.html
Message Symfony\Component\Routing\Exception\MethodNotAllowedException: in Drupal\Core\Routing\Router->matchRequest() (line 134 of /opt/drupal/web/core/lib/Drupal/Core/Routing/Router.php).
Severity Error
Hostname 172.28.0.1

Does the user you’re testing with have permission to create new animal_type taxonomy terms?

Yes, I should have permission to create animal_type according to the JSON:API setting: “Accept all JSON:API create, read, update, and delete operations.”

I added to the farmOS log above the Hostname which I forgot to write. Shouldn’t it be the localhost?

If you’re comfortable doing so, can you open your browser dev tools’ network tab and right click on that request which is failing with the 500 error, then click “copy as cURL”. The resulting clipboard contents can be pasted into a terminal (mac or linux) and " -i" can be added after the “curl” (e.g. “curl -i …”) part of the command to see the response headers along with the response body. That may give us a clue why it is failing…

It will probably look something like this;

curl -i 'https://v2.farmos.test/api/taxonomy_term/animal_type' -X POST -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0' -H 'Accept: */*' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br' -H 'Referer: https://v2.farmos.test/jupyterlite/lab/index.html' -H 'Content-type: application/vnd.api+json' -H 'X-CSRF-Token: ----OMITTED----' -H 'Origin: https://v2.farmos.test' -H 'DNT: 1' -H 'Connection: keep-alive' -H 'Cookie: ----OMITTED----; has_js=1; jupyterliteDrupalBasePath=/; assetLinkDrupalBasePath=/; ----OMITTED----' -H 'Sec-Fetch-Dest: empty' -H 'Sec-Fetch-Mode: cors' -H 'Sec-Fetch-Site: same-origin' --data-raw '{"data": {"type": "taxonomy_term--animal_type", "attributes": {"name": "Sheep2"}}}'

This is the response:
curl -k -i ‘https://localhost/api/taxonomy_term/animal_type’ -X POST -H ‘User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0’ -H ‘Accept: /’ -H ‘Accept-Language: de,en-US;q=0.7,en;q=0.3’ -H ‘Accept-Encoding: gzip, deflate, br’ -H ‘Referer: https://localhost/jupyterlite/lab/index.html’ -H ‘Content-type: application/vnd.api+json’ -H ‘X-CSRF-Token: —DEL—’ -H ‘Origin: https://localhost’ -H ‘Connection: keep-alive’ -H ‘Cookie: Drupal.tableDrag.showWeight=0; —DEL—; jupyterliteDrupalBasePath=/’ -H ‘Sec-Fetch-Dest: empty’ -H ‘Sec-Fetch-Mode: cors’ -H ‘Sec-Fetch-Site: same-origin’ --data-raw ‘{“data”: {“type”: “taxonomy_term–animal_type”, “attributes”: {“name”: “Sheep”}}}’

HTTP/1.1 500 500 Service unavailable (with message)
Server: nginx/1.21.6
Date: Mon, 21 Mar 2022 22:33:31 GMT
Content-Type: text/html; charset=UTF-8
Content-Length: 2194
Connection: keep-alive
Cache-Control: must-revalidate, no-cache, private
X-UA-Compatible: IE=edge
Content-language: en
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
Permissions-Policy: interest-cohort=()
expires: -1
X-Generator: Drupal 9 (https://www.drupal.org)
pragma: no-cache

The website encountered an unexpected error. Please try again later.<br><br><em class="placeholder">Symfony\Component\Routing\Exception\MethodNotAllowedException</em>:  in <em class="placeholder">Drupal\Core\Routing\Router-&gt;matchRequest()</em> (line <em class="placeholder">134</em> of <em class="placeholder">core/lib/Drupal/Core/Routing/Router.php</em>). <pre class="backtrace">Drupal\Core\Routing\AccessAwareRouter-&gt;matchRequest(Object) (Line: 151)
Drupal\Core\Routing\AccessAwareRouter-&gt;match(&#039;/api/taxonomy_term/animal_type&#039;) (Line: 138)
Drupal\user\Plugin\LanguageNegotiation\LanguageNegotiationUserAdmin-&gt;isAdminPath(Object) (Line: 104)
Drupal\user\Plugin\LanguageNegotiation\LanguageNegotiationUserAdmin-&gt;getLangcode(Object) (Line: 188)
Drupal\language\LanguageNegotiator-&gt;negotiateLanguage(&#039;language_interface&#039;, &#039;language-user-admin&#039;) (Line: 133)
Drupal\language\LanguageNegotiator-&gt;initializeType(&#039;language_interface&#039;) (Line: 218)
Drupal\language\ConfigurableLanguageManager-&gt;getCurrentLanguage() (Line: 92)
Drupal\language\EventSubscriber\LanguageRequestSubscriber-&gt;setLanguageOverrides() (Line: 74)
Drupal\language\EventSubscriber\LanguageRequestSubscriber-&gt;onKernelRequestLanguage(Object, &#039;kernel.request&#039;, Object)
call_user_func(Array, Object, &#039;kernel.request&#039;, Object) (Line: 142)
Drupal\Component\EventDispatcher\ContainerAwareEventDispatcher-&gt;dispatch(Object, &#039;kernel.request&#039;) (Line: 134)
Symfony\Component\HttpKernel\HttpKernel-&gt;handleRaw(Object, 1) (Line: 80)
Symfony\Component\HttpKernel\HttpKernel-&gt;handle(Object, 1, 1) (Line: 67)
Drupal\simple_oauth\HttpMiddleware\BasicAuthSwap-&gt;handle(Object, 1, 1) (Line: 58)
Drupal\Core\StackMiddleware\Session-&gt;handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\KernelPreHandle-&gt;handle(Object, 1, 1) (Line: 48)
Drupal\Core\StackMiddleware\ReverseProxyMiddleware-&gt;handle(Object, 1, 1) (Line: 51)
Drupal\Core\StackMiddleware\NegotiationMiddleware-&gt;handle(Object, 1, 1) (Line: 23)
Stack\StackedHttpKernel-&gt;handle(Object, 1, 1) (Line: 708)
Drupal\Core\DrupalKernel-&gt;handle(Object) (Line: 19)
</pre>

@Farmy can you please open a new topic to debug your instance? This thread is for sharing examples.

Edit: Although looking back it veered off topic long ago. So nevermind it probably doesn’t matter… ¯_(ツ)_/¯

I haven’t been able reproduce this problem, but I’m happy to help troubleshoot further if you feel like opening another thread @Farmy. I’m guessing this doesn’t really have anything to do with JupyterLite per se.

Yeah, it’s hard to tell how far afield to let the topic get. I think it can be useful to engage a bit on issues here, but probably not to go super deep on them…

2 Likes

Thank you @Symbioquine and @mstenta. I created a new topic here: Troubleshooting JSON:API / JupyterLite

1 Like

Some generally useful JupyterLite stuff I stumbled over: GitHub - innovationOUtside/ouseful_jupyterlite_utils: Utilities for working with JupyterLite

3 Likes