Pyodide wheels bypass pip, Datasette needs APSW, water rules fall to states
Today's three AI-tech ships each route around the default — alternate installer, alternate library, alternate jurisdiction — to deliver the headline.
Pyodide wheels bypass pip, Datasette needs APSW, water rules fall to states
TL;DR
- Pyodide 314 lands WASM wheels on PyPI via PEP 783, installable only through
micropip. - APSW alone exposes SQLite column-origin metadata, since most distro
libsqlite3builds disable the flag. - US data centers drew 211B gallons of indirect grid water in 2023, 12× the on-site cooling figure.
- Verge writer one-shotted a backyard-gardening app with Gemini in about 5 minutes.
Today’s three AI-tech stories share a shape: the capability lands, but the default path won’t deliver it. Pyodide 314 finally lets maintainers publish WASM wheels straight to PyPI through PEP 783 — and pip explicitly won’t install them, leaving micropip as the only on-ramp. Simon Willison’s dive into Datasette column provenance finds SQLite’s C API does expose the source table.column for every result, but only APSW surfaces it ergonomically, because most Linux distros ship libsqlite3 with the metadata flag compiled off. And a fresh look at data-center water finds the 17B-gallon on-site cooling figure everyone reports is dwarfed 12× by indirect grid water, which only California, Michigan, and Iowa have moved to require hyperscalers to disclose.
The pattern: alternate installer, alternate library, alternate jurisdiction. The standard exists in each case; you just can’t get to it through the obvious door.
Pyodide 314 lets PyPI host WASM wheels directly
Source: simon-willison · published 2026-06-13
TL;DR
- Pyodide 314 + PEP 783 let maintainers publish WASM wheels straight to PyPI, retiring a 300-package manual recipe queue.
- 28 packages already ship pyemscripten wheels, including
pydantic_core,onnx, and Willison’sluau-wasmproof of concept. - pip is sitting this out — install still routes through
micropip, not the standard installer, by design. - Year-stamped tag is an “escape valve” for Emscripten ABI churn, breaking manylinux-style self-contained stability.
The unblock
For years, getting a C or Rust extension into Pyodide meant landing a recipe in the Pyodide repo and waiting for the maintainers to build and host it. Over 300 packages accumulated that way, and the queue was the bottleneck. Pyodide 314.0, paired with PEP 783’s PyEmscripten platform tag, ends that arrangement: PyPI’s Warehouse merged support in April, cibuildwheel 4.1+ exposes a CIBW_PLATFORM: pyodide target 1, and maintainers can now ship WASM wheels the same way they ship Linux ones.
Willison’s celebration package, luau-wasm, is a clean demonstration — Roblox’s Luau scripting language cross-compiled from C++ into a 276KB cp314-cp314-pyemscripten_2026_0_wasm32.whl and installed in-browser with one micropip.install() call. Twenty-eight packages have published with the new tag so far, including the load-bearing pydantic_core and onnx.
Build side works. Install side is forked.
The architecture splits cleanly along build/install lines, and the split is the story:
flowchart LR
A[Maintainer repo] -->|cibuildwheel 4.1+| B[pyemscripten wheel]
B -->|standard upload| C[(PyPI)]
C -->|micropip.install| D[Browser / Pyodide runtime]
C -. pip will not install .-> E[Local CPython]
On the PEP 783 discussion, pip maintainer Paul Moore pushed back on both the annual pyemscripten_2026_0 versioning and the runtime-patching tricks Pyodide uses, and declined to commit pip to supporting the workflow 2. The “publish to PyPI like any other wheel” framing is true for upload; for install, the standard tool stays out and micropip carries the load.
Why the tag has a year in it
The pyemscripten_YYYY_N scheme looks weird next to manylinux2014, and that’s deliberate. Pyodide maintainers describe it as an “emergency escape valve” so they can ship ABI-breaking toolchain fixes without waiting for a new CPython release, with the actual ABI defined by living documentation on pyodide.org rather than the PEP 3. Emscripten offers no ABI stability between versions, which is why scikit-learn — itself migrating its build into CI to publish wheels directly — flagged that every Pyodide bump may demand a full rebuild matrix 4. cibuildwheel also notably ships no default repair-wheel-command for the Pyodide target; maintainers must explicitly run pyodide auditwheel repair to bundle shared WASM libs 1. Symmetry with native is not there yet.
What still can’t ride this rail
The 28-package list is missing the one entry most ML-curious readers would want: onnxruntime. Despite onnx shipping a pyemscripten wheel, the runtime’s maintainers have concluded that cross-compiling the full C++ engine is impractical under Emscripten’s pthreads and Asyncify constraints, and are instead exploring a proxy to the JavaScript onnxruntime-web 5. Third-party recaps of the HN thread captured the broader skepticism — “runtime on a runtime,” with comparisons to Java Applets, and reports of GC-related OOMs during the 3.14 transition 6.
The honest read: PyPI hosting is a real unblock for pure-Rust and pure-C extensions, and the long tail of Pyodide packaging just got dramatically cheaper. The deepest scientific stack still doesn’t fit, and the install half of “just use PyPI” has a pip-shaped hole in it.
Further reading
- luau-wasm 0.1a0 — simon-willison
Datasette needs SQLite column provenance; only APSW delivers
Source: simon-willison · published 2026-06-13
TL;DR
- Simon Willison used Claude Code to map every SQLite result column back to its source
table.column. - APSW’s
description_fullis the only ergonomic path — a 5-tuple per column from SQLite’s column-metadata C API. - The ctypes shortcut breaks on most Linux distros, which ship libsqlite3 without the
SQLITE_ENABLE_COLUMN_METADATAflag APSW bundles itself. - CTEs and expressions return
NULLorigin from the C API, capping the UX ceiling for non-trivial queries.
Why Datasette wants this
Datasette renders table views with rich affordances — typed columns, foreign-key links, facets — because it knows the schema. Arbitrary SQL results get none of that, because Python’s stdlib sqlite3 cursor tells you a column is named name and nothing else. The DB-API cursor.description leaves the schema/table/origin slots permanently None 7, so a query like SELECT users.name FROM users JOIN orders ON ... arrives at the renderer as anonymous strings. Practitioners building notebook-style SQLite UIs currently paper over this by re-querying sqlite_schema and PRAGMA table_info and matching on column names — a workaround that collapses the moment a query uses aliases or joins same-named columns from two tables 8.
Willison’s research surfaced three ways out, with very different ergonomics.
Three paths, one clean winner
| Approach | What you get | What it costs |
|---|---|---|
APSW description_full | 5-tuple (name, declared_type, database, table, origin) per column, straight from the C API 9 | Third-party driver; ships its own SQLite amalgamation |
ctypes to sqlite3_column_table_name() | Same data, no extra dependency | Symbol may not exist in system libsqlite3 10 |
EXPLAIN bytecode parsing | Can recover lineage through CTEs and subqueries | Depends on undocumented VDBE opcodes 11 |
APSW is the least surprising answer. Its cursor.c implementation simply iterates the prepared statement and calls sqlite3_column_{database,table,origin}_name once per output column 12, which makes per-query overhead negligible. The real cost is the compile-time dependency: APSW ships its own SQLite amalgamation precisely because distro builds frequently omit SQLITE_ENABLE_COLUMN_METADATA 10. That’s a meaningful footnote to Willison’s “ctypes alternative” — on a stock Debian or Ubuntu Python, the bridge will load, but the symbol it’s trying to call may not be there.
The CTE ceiling
The same C-level walk that powers description_full returns NULL for any column that isn’t a direct read from a base table 12. SELECT users.name resolves cleanly. SELECT upper(users.name) does not. Anything routed through WITH x AS (...) produces a column with no origin. For Datasette’s use case that’s a hard ceiling: a user who writes a CTE — the natural way to express anything non-trivial in SQLite — loses every UI affordance the provenance machinery was supposed to provide.
That’s why the EXPLAIN-parsing path is more than a curiosity. SQLite’s VDBE trace exposes OpenRead (cursor↔table binding), Column (cursor,col→register), and ResultRow (register range→output) opcodes; thread those three and you have register-level lineage the C API will never give you 11. The catch is that VDBE bytecode is an internal contract. Wiring Datasette to it means owning a fragility budget that grows with every SQLite release.
Takeaway
The deliverable here isn’t a library — it’s a sharpened decision. For Datasette to ship column provenance, Willison has to choose: depend on APSW and accept a non-stdlib database driver, or depend on EXPLAIN output and accept a moving target. The “just use ctypes” middle path is the one that doesn’t actually work in production. Worth noting that an AI agent walked the entire design space — APSW discovery, ctypes implementation, EXPLAIN interrogation — in a single session, which is closer to what “AI-assisted research” looks like in practice than most demos.
AI data centers’ grid water dwarfs on-site cooling 12×
Source: ars-technica-ai · published 2026-06-12
TL;DR
- US data centers’ indirect grid water hit ~211B gallons in 2023 — 12× the 17B gallons drawn for on-site cooling.
- Aggregate withdrawals are still only 0.12–0.4% of US daily total, below golf courses and far below agriculture.
- Local incidents — Arizona’s killed Project Blue, a Georgia facility quietly pulling 29M gallons through unmetered taps — drive the politics.
- California, Michigan, and Iowa have bills requiring site-level water reporting hyperscalers long resisted.
The national number really is small
Ars’s framing is correct as far as it goes: all US data centers combined account for between 0.12% and 0.4% of daily national water withdrawals, less than golf courses and a rounding error next to agriculture 13. If you only look at the aggregate, “AI is draining the planet” headlines don’t survive contact with the spreadsheet. Most facilities also recirculate cooling water rather than consume it outright, so the gross-withdrawal number overstates net loss.
That’s the defense. It’s load-bearing, and it’s the part the discourse keeps getting wrong. It’s also incomplete.
The bucket has a second tap
The piece focuses on on-site cooling. Berkeley Lab’s 2024 DOE accounting found US data centers used roughly 17.4 billion gallons directly for cooling in 2023, and 211 billion gallons indirectly via the power grid — over 12× the on-site figure 14. Thermoelectric plants need water to make steam and reject heat; every kilowatt-hour a data center pulls drags a few liters of evaporation with it somewhere upstream.
flowchart LR
A[Data center] -->|~17B gal/yr| B[On-site cooling]
A -->|electricity demand| C[Thermoelectric power plants]
C -->|~211B gal/yr| D[Grid water withdrawals]
This matters for how you read vendor claims. A facility running fully closed-loop or air-cooled can still drive enormous off-site water consumption. Any “drop in the bucket” calculation that excludes Scope 2 water is undercounting by an order of magnitude.
Local failures are doing the political work
National averages aren’t what’s killing projects — specific local catastrophes are. In Arizona, “Project Blue” was abandoned after residents calculated what hyperscale evaporative cooling would do to their desert aquifer 15. In Morgan County, Georgia, a data center was found drawing 29 million gallons through unmetered municipal connections, discovered only because neighbors reported pressure drops 16. Episodes like these — not national averages — are what land in legislators’ inboxes.
That pressure is why states are moving. California, Michigan, and Iowa have introduced bills requiring site-specific, regular water-consumption reporting from data centers 17 — exactly the granularity hyperscalers have historically pushed back against, and exactly what makes “we use less than a golf course nationally” rhetorically useless to the family next door to a 200MW build.
What closed-loop cooling can and can’t fix
Microsoft’s pitch is the industry’s main rebuttal. Satya Nadella says revamped facilities using closed-loop liquid cooling can run on water consumption “comparable to a single restaurant” 18. The number is real for greenfield builds. The asterisks are larger than the headline.
First, dry and closed-loop cooling typically trades water for electricity — and electricity, per 14, is where the bigger water footprint already lives. Second, the technology applies to new construction; hundreds of legacy facilities continue to evaporate millions of gallons a day. Third, even a “restaurant-sized” cooling draw matters when it lands on a stressed aquifer rather than a municipal main with spare capacity.
The honest version of the Ars argument: nationally, data center water is small and getting more efficient per FLOP. Locally, the worst projects can drain a town, the indirect grid footprint is the bigger number, and the disclosure bills now advancing exist because the industry’s voluntary numbers stopped being trusted.
Round-ups
Verge writer vibecodes a backyard-gardening app with Gemini
Source: the-verge-ai
Gemini built a functional gardening tracker from a single lengthy prompt in about five minutes, surfacing a self-fix button when the preview threw an ‘unrecoverably broken channel’ error. The piece reads as a field test of how far one-shot vibecoding has come for non-developers tackling household problems.
Footnotes
-
cibuildwheel documentation (pypa.io) — https://cibuildwheel.pypa.io/en/stable/options/
↩ ↩2With CIBW_PLATFORM: pyodide, cibuildwheel v4.1.0+ builds pyemscripten wheels, but unlike Linux (auditwheel) or Windows (delvewheel) it sets no default repair-wheel-command — users must explicitly invoke
pyodide auditwheel repairto bundle shared WASM libraries. -
Python Discuss — PEP 783 thread — https://discuss.python.org/t/pep-783-emscripten-packaging/86862
↩Paul Moore questioned why separate platform tags were needed if they were strictly tied to annual Python releases, and explicitly stated that pip does not plan to support the internal mechanisms Pyodide uses to handle these wheels, calling some of the proposed patching methods ‘crazy’.
-
Pyodide GitHub issue #6084 — https://github.com/pyodide/pyodide/issues/6084
↩The pyemscripten tag versioning (e.g. pyemscripten_2026_0) is intentionally an ‘emergency escape valve’ so the Pyodide team can ship ABI-breaking toolchain fixes without waiting for a new CPython release — the ABI itself is defined by living documentation on pyodide.org rather than the PEP.
-
scikit-learn GitHub issue #34278 — https://github.com/scikit-learn/scikit-learn/issues/34278
↩scikit-learn proposed moving its WASM build into CI to publish pyemscripten wheels directly to PyPI, replacing the manual Pyodide recipe pipeline — but maintainers flagged that Emscripten provides no ABI stability between toolchain versions, forcing rebuilds for each Pyodide release.
-
onnxruntime GitHub issue #26904 — https://github.com/microsoft/onnxruntime/issues/26904
↩Despite onnx itself shipping pyemscripten wheels, the onnxruntime Python API is still a feature request for Pyodide; maintainers are exploring proxying calls to the existing JavaScript onnxruntime-web rather than cross-compiling the full C++ engine, due to pthreads and Asyncify overhead.
-
deepintellica summary of HN discussion — https://deepintellica.com/ai-work/pyodide-314-0-python-packages-can-now-publish-webassembly-wheels-to-pypi/
↩Commenters characterized the architecture as a ‘runtime on a runtime,’ comparing the trajectory to Java Applets and Flash, and noted a ‘bumpy’ transition to the 3.14 core due to ‘garbage collection rigamarole’ producing OOM issues in test suites.
-
Devart Python SQLite cursor docs — https://docs.devart.com/python/sqlite/cursor-class.htm
↩The standard DB-API cursor.description returns only (name, type_code, …) with the trailing schema/table/origin slots left as None — confirming the stdlib gap Willison’s ctypes bridge fills.
-
Audrey Roy Greenfeld — Constructing SQLite Tables (feldroy.com) — https://audrey.feldroy.com/articles/2025-01-14-Constructing-SQLite-Tables-for-Notebooks-and-Search
↩Practitioners building notebook-style SQLite UIs already work around the missing provenance by re-querying sqlite_schema and PRAGMA table_info, rather than trusting cursor.description.
-
APSW Cursor documentation (rogerbinns.github.io) — https://rogerbinns.github.io/apsw/cursor.html
↩description_full returns a 5-tuple per column: (name, declared_type, database_name, table_name, origin_name) — the latter three only populated when SQLite was built with SQLITE_ENABLE_COLUMN_METADATA.
-
APSW Tips (rogerbinns.github.io) — https://rogerbinns.github.io/apsw/tips.html
↩ ↩2Many platform SQLite builds omit SQLITE_ENABLE_COLUMN_METADATA; APSW ships its own SQLite amalgamation precisely so column-origin APIs are reliably available.
-
Jason Wyatt — ‘Squeezing performance from SQLite: explaining the virtual machine’ (Medium) — https://medium.com/@JasonWyatt/squeezing-performance-from-sqlite-explaining-the-virtual-machine-2550ef6c5db
↩ ↩2EXPLAIN emits VDBE opcodes like OpenRead (cursor↔table), Column (cursor,col→register) and ResultRow (register range→output) — the exact trio you’d need to thread to reconstruct provenance from bytecode.
-
APSW cursor.c source (github.com/rogerbinns/apsw) — https://github.com/rogerbinns/apsw/blob/master/src/cursor.c
↩ ↩2Cursor_getdescription_full walks the prepared statement calling sqlite3_column_database_name/table_name/origin_name for each output column — returning NULL fields for expressions, subqueries and CTE-derived results.
-
Andy Masley blog — ‘The AI Water Issue Is Fake’ — https://blog.andymasley.com/p/the-ai-water-issue-is-fake
↩all U.S. data centers combined account for only 0.12% to 0.4% of total daily water withdrawals, a fraction of what is required by agriculture or even golf courses
-
FWPCOA / Berkeley Lab figures — https://vertexaisearch.cloud.google.com/grounding-api-redirect/AUZIYQEnfK_aixqB4poiQauSpGq4Dv4eXRNdrbROj4u7sOYapp5o6OXllqc425gJFVgq15l_T4AxP2dvuHDa_UeqSJiIWkJrHQLcK8DPYUZJ8QaeRyb7THInKtsTkK8z9cYzR06Tm1ShhUlGSxhYuFG774p2kOZeGb4-gdzhcqp5taJsPds=
↩ ↩2U.S. data centers were estimated to use roughly 17.4 billion gallons of water directly for cooling, whereas their indirect footprint via the power grid was approximately 211 billion gallons—over 12 times the on-site amount
-
The Guardian — datacenter AI drought water — https://www.theguardian.com/us-news/2026/jun/08/datacenter-ai-drought-water
↩‘Project Blue’ in Arizona, which was abandoned after intense public outcry over its plan to extract millions of gallons of desert groundwater
-
CBS News Atlanta — https://www.cbsnews.com/atlanta/news/aoc-presses-epa-over-morgan-county-drinking-water-concerns-tied-to-georgia-data-center-development/
↩a data center secretly drawing 29 million gallons through unmetered connections, which was only discovered after neighbors complained of low pressure
-
MultiState — state water legislation tracker — https://www.multistate.us/insider/2026/3/3/state-data-center-water-usage-legislation-gains-momentum
↩States such as California, Michigan, and Iowa introduced bills requiring data centers to submit regular, site-specific reports on water consumption
-
Windows Central — Nadella on closed-loop cooling — https://www.windowscentral.com/microsoft/satya-nadella-microsofts-ai-data-center-water-cooling-a-single-restaurant
↩some revamped data centers using ‘closed-loop’ liquid cooling can operate with water consumption comparable to a single restaurant