Planet Igalia

April 16, 2024

Philippe Normand

From WebKit/GStreamer to rust-av, a journey on our stack’s layers

In this post I’ll try to document the journey starting from a WebKit issue and ending up improving third-party projects that WebKitGTK and WPEWebKit depend on.

I’ve been working on WebKit’s GStreamer backends for a while. Usually some new feature needed on WebKit side would trigger work on GStreamer. That’s quite common and healthy actually, by improving GStreamer (bug fixes or implementing new features) we make the whole stack stronger (hopefully). It’s not hard to imagine other web-engines, such as Servo for instance, leveraging fixes made in GStreamer in the context of WebKit use-cases.

Sometimes though we have to go deeper and this is what this post is about!

Since version 2.44, WebKitGTK and WPEWebKit ship with a WebCodecs backend. That backend leverages the wide range of GStreamer audio and video decoders/encoders to give low-level access to encoded (or decoded) audio/video frames to Web developers. I delivered a lightning talk at gst-conf 2023 about this topic.

There are still some issues to fix regarding performance and some W3C web platform tests are still failing. The AV1 decoding tests were flagged early on while I was working on WebCodecs, I didn’t have time back then to investigate the failures further, but a couple weeks ago I went back to those specific issues.

The WebKit layout tests harness is executed by various post-commit bots, on various platforms. The WebKitGTK and WPEWebKit bots run on Linux. The WebCodec tests for AV1 currently make use of the GStreamer av1enc and dav1ddec elements. We currently don’t run the tests using the modern and hardware-accelerated vaav1enc and vaav1dec elements because the bots don’t have compatible GPUs.

The decoding tests were failing, this one for instance (the ?av1 variant). In that test both encoding and decoding are tested, but decoding was failing, for a couple reasons. Rabbit hole starts here. After debugging this for a while, it was clear that the colorspace information was lost between the encoded chunks and the decoded frames. The decoded video frames didn’t have the expected colorimetry values.

The VideoDecoderGStreamer class basically takes encoded chunks and notifies decoded VideoFrameGStreamer objects to the upper layers (JS) in WebCore. A video frame is basically a GstSample (Buffer and Caps) and we have code in place to interpret the colorimetry parameters exposed in the sample caps and translate those to the various WebCore equivalents. So far so good, but the caps set on the dav1ddec elements didn’t have those informations! I thought the dav1ddec element could be fixed, “shouldn’t be that hard” and I knew that code because I wrote it in 2018 :)

So let’s fix the GStreamer dav1ddec element. It’s a video decoder written in Rust, relying on the dav1d-rs bindings of the popular C libdav1d library. The dav1ddec element basically feeds encoded chunks of data to dav1d using the dav1d-rs bindings. In return, the bindings provide the decoded frames using a Dav1dPicture Rust structure and the dav1ddec GStreamer element basically makes buffers and caps out of this decoded picture. The dav1d-rs bindings are quite minimal, we implemented API on a per-need basis so far, so it wasn’t very surprising that… colorimetry information for decoded pictures was not exposed! Rabbit hole goes one level deeper.

So let’s add colorimetry API in dav1d-rs. When working on (Rust) bindings of a C library, if you need to expose additional API the answer is quite often in the C headers of the library. Every Dav1dPicture has a Dav1dSequenceHeader, in which we can see a few interesting fields:

typedef struct Dav1dSequenceHeader {
...
    enum Dav1dColorPrimaries pri; ///< color primaries (av1)
    enum Dav1dTransferCharacteristics trc; ///< transfer characteristics (av1)
    enum Dav1dMatrixCoefficients mtrx; ///< matrix coefficients (av1)
    enum Dav1dChromaSamplePosition chr; ///< chroma sample position (av1)
    ...
    uint8_t color_range;
    ...
...
} Dav1dSequenceHeader;

After sharing a naive branch with rust-av co-maintainers Luca Barbato and Sebastian Dröge, I came up with a couple pull-requests that eventually were shipped in version 0.10.3 of dav1d-rs. I won’t deny matching primaries, transfer, matrix and chroma-site enum values to rust-avs Pixel enum was a bit challenging :P Anyway, with dav1d-rs fixed up, rabbit hole level goes up one level :)

Now with the needed dav1d-rs API, the GStreamer dav1ddec element could be fixed. Again, matching the various enum values to their GStreamer equivalent was an interesting exercise. The merge request was merged, but to this date it’s not shipped in a stable gst-plugins-rs release yet. There’s one more complication here, ABI broke between dav1d 1.2 and 1.4 versions. The dav1d-rs 0.10.3 release expects the latter. I’m not sure how we will cope with that in terms of gst-plugins-rs release versioning…

Anyway, WebKit’s runtime environment can be adapted to ship dav1d 1.4 and development version of the dav1ddec element, which is what was done in this pull request. The rabbit is getting out of his hole.

The WebCodec AV1 tests were finally fixed in WebKit, by this pull request. Beyond colorimetry handling a few more fixes were needed, but luckily those didn’t require any fixes outside of WebKit.

Wrapping up, if you’re still reading this post, I thank you for your patience. Working on inter-connected projects can look a bit daunting at times, but eventually the whole ecosystem benefits from cross-project collaborations like this one. Thanks to Luca and Sebastian for the help and reviews in dav1d-rs and the dav1ddec element. Thanks to my fellow Igalia colleagues for the WebKit reviews.

by Philippe Normand at April 16, 2024 08:14 PM

April 04, 2024

Jesse Alama

The decimals around us: Cataloging support for decimal numbers

A cat­a­log of sup­port for dec­i­mal num­bers in var­i­ous pro­gram­ming lan­guages

Dec­i­mals num­bers are a data type that aims to ex­act­ly rep­re­sent dec­i­mal num­bers. Some pro­gram­mers may not know, or ful­ly re­al­ize, that, in most pro­gram­ming lan­guages, the num­bers that you en­ter look like dec­i­mal num­bers but in­ter­nal­ly are rep­re­sent­ed as bi­na­ry—that is, base-2—float­ing-point num­bers. Things that are to­tal­ly sim­ple for us, such as 0.1, sim­ply can­not be rep­re­sent­ed ex­act­ly in bi­na­ry. The dec­i­mal data type—what­ev­er its stripe or fla­vor—aims to rem­e­dy this by giv­ing us a way of rep­re­sent­ing and work­ing with dec­i­mal num­bers, not bi­na­ry ap­prox­i­ma­tions there­of. (Wikipedia has more.)

To help with my work on adding dec­i­mals to JavaScript, I've gone through a list of pop­u­lar pro­gram­ming lan­guages, tak­en from the 2022 Stack­Over­flow de­vel­op­er sur­vey. What fol­lows is a brief sum­ma­ry of where these lan­guages stand re­gard­ing dec­i­mals. The in­ten­tion is to keep things sim­ple. The pur­pose is:

  1. If a lan­guage does have dec­i­mals, say so;
  2. If a lan­guage does not have dec­i­mals, but at least one third-par­ty li­brary ex­ists, men­tion it and link to it. If a dis­cus­sion is un­der­way to add dec­i­mals to the lan­guage, link to that dis­cus­sion.

There is no in­ten­tion to fil­ter out an lan­guage in par­tic­u­lar; I'm just work­ing with a slice of lan­guages found in in the Stack­Over­flow list linked to ear­li­er. If a lan­guage does not have dec­i­mals, there may well be mul­ti­ple third-part dec­i­mal li­braries. I'm not aware of all li­braries, so if I have linked to a mi­nor li­brary and ne­glect to link to a more high-pro­file one, please let me know. More im­por­tant­ly, if I have mis­rep­re­sent­ed the ba­sic fact of whether dec­i­mals ex­ists at all in a lan­guage, send mail.

C

C does not have dec­i­mals. But they're work­ing on it! The C23 stan­dard (as in, 2023) stan­dard pro­pos­es to add new fixed bit-width data types (32, 64, and 128) for these num­bers.

C#

C# has dec­i­mals in its un­der­ly­ing .NET sub­sys­tem. (For the same rea­son, dec­i­mals also ex­ist in Vi­su­al Ba­sic.)

C++

C++ does not have dec­i­mals. But—like C—they're work­ing on it!

Dart

Dart does not have dec­i­mals. But a third-par­ty li­brary ex­ists.

Go

Go does not have dec­i­mals, but a third-par­ty li­brary ex­ists.

Java

Java has dec­i­mals.

JavaScript

JavaScript does not have dec­i­mals. We're work­ing on it!

Kotlin

Kotlin does not have dec­i­mals. But, in a way, it does: since Kotlin is run­ning on the JVM, one can get dec­i­mals by us­ing Java's built-in sup­port.

PHP

PHP does not have dec­i­mals. An ex­ten­sion ex­ists and at least one third-par­ty li­brary ex­ists.

Python

Python has dec­i­mals.

Ruby

Ruby has dec­i­mals. De­spite that, there is some third-par­ty work to im­prove the built-in sup­port.

Rust

Rust does not have dec­i­mals, but a crate ex­ists.

SQL

SQL has dec­i­mals (it is the DECIMAL data type). (Here is the doc­u­men­ta­tion for, e.g., Post­greSQL, and here is the doc­u­men­ta­tion for MySQL.)

Swift

Swift has dec­i­mals

Type­Script

Type­Script does not have dec­i­mals. How­ev­er, if dec­i­mals get added to JavaScript (see above), Type­Script will prob­a­bly in­her­it dec­i­mals, even­tu­al­ly.

April 04, 2024 12:52 PM

Getting started with Lean 4, your next programming language

I had the plea­sure of learn­ing about Lean 4 with David Chris­tiansen and Joachim Bre­it­ner at their tu­to­r­i­al at BOBKonf 2024. I‘m plan­ning on do­ing a cou­ple of for­mal­iza­tions with Lean and would love to share what I learn as a to­tal new­bie, work­ing on ma­cOS.

Need­ed tools

I‘m on ma­cOS and use Home­brew ex­ten­sive­ly. My sim­ple go-to ap­proach to find­ing new soft­ware is to do brew search lean. This re­vealed lean as well as sur­face elan. Run­ning brew info lean showed me that that pack­age (at the time I write this) in­stalls Lean 3. But I know, out-of-band, that Lean 4 is what I want to work with. Run­ning brew info elan looked bet­ter, but the out­put re­minds me that (1) the in­for­ma­tion is for the elan-init pack­age, not the elan cask, and (2) elan-init con­flicts with both the elan and the afore­men­tioned lean. Yikes! This strikes me as a po­ten­tial prob­lem for the com­mu­ni­ty, be­cause I think Lean 3, though it still works, is pre­sum­ably not where new Lean de­vel­op­ment should be tak­ing place. Per­haps the Home­brew for­mu­la for Lean should be up­dat­ed called lean3, and a new lean4 pack­age should be made avail­able. I‘m not sure. The sit­u­a­tion seems less than ide­al, but in short, I have been suc­cess­ful with the elan-init pack­age.

Af­ter in­stalling elan-init, you‘ll have the elan tool avail­able in your shell. elan is the tool used for main­tain­ing dif­fer­ent ver­sions of Lean, sim­i­lar to nvm in the Node.js world or pyenv.

Set­ting up a blank pack­age

When I did the Lean 4 tu­to­r­i­al at BOB, I worked en­tire­ly with­in VS Code (…) and cre­at­ed a new stand­alone pack­age us­ing some in-ed­i­tor func­tion­al­i­ty. At the com­mand line, I use lake init to man­u­al­ly cre­ate a new Lean pack­age. At first, I made the mis­take of run­ning this com­mand, as­sum­ing it would cre­ate a new di­rec­to­ry for me and set up any con­fig­u­ra­tion and boil­er­plate code there. I was sur­prised to find, in­stead, that lake init sets things up in the cur­rent di­rec­to­ry, in ad­di­tion to cre­at­ing a sub­di­rec­to­ry and pop­u­lat­ing it. Us­ing lake --help, I read about the lake new com­mand, which does what I had in mind. So I might sug­gest us­ing lake new rather than lake init.

What‘s in the new di­rec­to­ry? Do­ing tree foobar re­veals

foobar ├── Foobar │   └── Basic.lean ├── Foobar.lean ├── Main.lean ├── lakefile.lean └── lean-toolchain

Tak­ing a look there, I see four .lean files. Here‘s what they con­tain:

Main.lean

import «Foobar» def main : IO Unit := IO.println s!"Hello, {hello}!"

Foobar.lean

-- This module serves as the root of the `Foobar` library. -- Import modules here that should be built as part of the library. import «Foobar».Basic

Foobar/Basic.lean

def hello := "world"

lakefile.lean

import Lake open Lake DSL package «foobar» where -- add package configuration options here lean_lib «Foobar» where -- add library configuration options here @[default_target] lean_exe «foobar» where root := `Main

It looks like there‘s a lit­tle mod­ule struc­ture here, and a ref­er­ence to the iden­ti­fi­er hello, de­fined in Foobar/Basic.lean and made avail­able via Foobar.lean. I’m not go­ing to touch lakefile.lean for now; as a new­bie, it looks scary enough that I think I’ll just stick to things like Basic.lean.

There‘s also an au­to­mat­i­cal­ly cre­at­ed .git there, not shown in the di­rec­to­ry out­put above.

Now what?

Now that you‘ve got Lean 4 in­stalled and set up a pack­age, you‘re ready to dive in to one of the of­fi­cial tu­to­ri­als. The one I‘m work­ing through is David‘s Func­tion­al Pro­gram­ming in Lean. There‘s all sorts of ad­di­tion­al things to learn, such as all the dif­fer­ent lake com­mands. En­joy!

April 04, 2024 01:48 AM

April 03, 2024

Brian Kardell

The Blessing of the Strings

The Blessing of the Strings

Trusted Types have been a proposal by Google for quite some time at this point, but it's currently getting a lot of attention and work in all browsers (Igalia is working on implementations in WebKit and Gecko, sponsored by Salesforce and Google, respectively). I've been looking at it a lot and thought it's probably something worth writing about.

The Trusted Types proposal is about preventing Cross-site scripting (XSS), and rides atop Content Security Policy (CSP) and allows website maintainers to say "require trusted-types". Once required, lots of the Web Platform's dangerous API surfaces ("sinks") which currently require a string will now require... well, a different type.

myElement.innerHTML (and a whole lot of other APIs) for example, would now require a TrustedHTML object instead of just a string.

You can think of TrustedHTML as an interface indicating that a string has been somehow specially "blessed" as safe... Sanitized.

the Holy Hand grenade scene from Monty Python's Holy Grail
And Saint Attila raised the string up on high, saying, 'O Lord, bless this thy string, that with it we may trust that it is free of XSS...' [ref].

Granting Blessings

The interesting thing about this is how one goes about blessing strings, and how this changes the dynamics of development and safety to protect from XSS.

To start with, there is a new global trustedTypes object (available in both window and workers) with a method called .createPolicy which can be used to create "policies" for blessing various kinds of input (createHTML, createScript, and createScriptURL). Trusted Types comes with the concept of a default policy, and the ability for you to register a specially named "default"...

//returns a policy, but you 
// don't really need to do anything 
// with the default one
trustedTypes.createPolicy(
    "default", 
    {
      createHTML: s => { 
          return DOMPurify.sanitize(s) 
      } 
    }
);

And now, the practical upshot is that all attempts to set HTML will be sanitized... So if there's some code that tries to do:

// if str contains
// `&lt;img src="no" onerror="<em>dangerous code</em>" &gt;`;
target.innerHTML =  str;

Then the onerror attribute will be automatically stripped (sanitized) before .innerHTML gets it.

Hey that's pretty cool!

one of the scenes where the castle guard is mocking arthur and his men
It's almost like you just put defenses around all that stuff and can just peer over the wall at would be attackers and make faces at them....

But wait... can't someone come along then and just create a more lenient policy called default?

No! That will throw an exception!

Also, you don't have to create a default. If you don't, and someone tries to use one of those methods to assign a string, it will throw.

The only thing this enforcement cares about is that it is one of these "blessed" types. Website administrators can also provide (in the header) the name of 1 or more policies which should be created.

Any attempts to define a policy not in that list will throw (it's a bit more complicated than that, see Name your Policy below). Let's imagine that in the header we specified that a policy named "sanitize" is allowed to be created.

Maybe you can see some of why that starts to get really interesting. In order to use any of those APIs (at all), you'd need access to a policy in order to bless the string. But because the policy which can do that blessing is a handle, it's up to you what code you give it to...

{
  const sanitizerPolicy = 
      trustedTypes.createPolicy(
        "sanitize",
        {
          createHTML: s => { 
            return DOMPurify.sanitize(s) 
        } 
  );


    // give someOtherModule access to a sanitization policy
    someOtherModule.init(sanitizerPolicy)

    // yetAnotherModule can't even sanitize, any use of those
    // APIs will throw
    yetAnotherModule.foo()
}

// Anything out here also doesn't have 
// access to a sanitization policy

What's interesting about this is that the thing doing the trusting on the client, is actually on the client as well - but the pattern ensures that this becomes a considerably more finite problem. It is much easier to audit whether the "trust" is warranted. That is, we can look at the above to see that there is only one policy and it only supports creating HTML. We can see that the trust there is placed in DOMPurify, and even that amount of trust is only provided to select modules.

Finally, most importantly: It is a pattern that is machine enforceable. Anything that tries to use any of those APIs without a blessed string (a Trusted Type) will fail... Unless you ask it not to.

Don't Throw, Just Help?

Shutting down all of those APIs after the fact is hard because all of those dangerous APIs are also really useful and therefore widely used. As I said earlier, auditing to find and understand all uses of them all is pretty difficult. Chances are pretty good that there might just be a lot more unsafe stuff floating around in your site than you expected.

Instead of Content-Security-Policy CSP headers, you can send Content-Security-Policy-Report-Only and include a directive that includes report-to /csp-violation-report-endpoint/ where /csp-violation-report-endpoint/ is an endpoint path (on the same origin). If set, whenever violations occur, browsers should send a request to report a violation to that endpoint (JSON formatted with lots of data).

The general idea is that it is then pretty easy to turn this on and monitor your site to discover where you might have some problems, and begin to work through them. This should be especially good for your QA environment. Just keep in mind that the report doesn't actually prevent the potentially bad things from happening, it just lets you know they exist.

Shouldn't there just be a standard santizer too?

Yes!! That is also a thing that is being worked on.

Name Your Policy

I'm not going to lie, I found CSP/headers to be both a little confusing to read and to figure out their relationships. You might see a header set up to report only....

Content-Security-Policy-Report-Only: report-uri /csp-violation-report-endpoint; default-src 'self'; require-trusted-types-for 'script'; trusted-types one two;

Believe it or not that's a fairly simple one. Basically though, you split it up on semi-colons and each of those is a directive. The directive has a name like "report-uri" followed by whitespace and then a list of values (potentially containing only 1) which are whitespace separated. There are also keyword values which are quoted.

So, the last two parts of this are about Trusted Types. The first, require-trusted-types-for is about what gets some kind of enforcement and really the only thing you can put there currently is the keyword 'script'. The second, trusted-types is about what policies can be created.

Note that I said "some kind of enforcement" because the above is "report only" which means those things will report, but not actually throw, while if we just change the name of the header from Content-Security-Policy-Report-Only to Content-Security-Policy lots of things might start throwing - which didn't greatly help my exploration. So, here's a little table that might help..

If the directives are... then...
(missing) You can create whatever policies you want (except duplicates), but they aren't enforced in any way.
require-trusted-types-for 'script'; You can create whatever policies you want (except duplicates), and they are enforced. All attempts to assign strings to those sinks will throw. This means if you create a policy named default, it will 'bless' strings through that automatically, but it also means anyone can create any policy to 'bless' strings too.
trusted-types You cannot create any policies whatsoever. Attempts to will throw.
trusted-types 'none' Same as with no value.
trusted-types a b You can call createPolicy with names 'a' and 'b' exactly once. Attempts to call with other names (including 'default'), or repeatedly will throw.
trusted-types default You can call createPolicy with names 'default' exactly once. Attempts to call with other names, or repeatedly will throw.
require-trusted-types-for 'script'; trusted-types a You can call createPolicy with names 'a' exactly once. Attempts to call with other names (including default), or repeatedly will throw. All attempts to assign strings to those sinks will throw unless they are 'blessed' from a function in a policy named 'a'

April 03, 2024 04:00 AM

April 02, 2024

Maíra Canal

Linux 6.8: AMD HDR and Raspberry Pi 5

The Linux kernel 6.8 came out on March 10th, 2024, bringing brand-new features and plenty of performance improvements on different subsystems. As part of Igalia, I’m happy to be an active part of many features that are released in this version, and today I’m going to review some of them.

Linux 6.8 is packed with a lot of great features, performance optimizations, and new hardware support. In this release, we can check the Intel Xe DRM driver experimentally, further support for AMD Zen 5 and other upcoming AMD hardware, initial support for the Qualcomm Snapdragon 8 Gen 3 SoC, the Imagination PowerVR DRM kernel driver, support for the Nintendo NSO controllers, and much more.

Igalia is widely known for its contributions to Web Platforms, Chromium, and Mesa. But, we also make significant contributions to the Linux kernel. This release shows some of the great work that Igalia is putting into the kernel and strengthens our desire to keep working with this great community.

Let’s take a deep dive into Igalia’s major contributions to the 6.8 release:

AMD HDR & Color Management

You may have seen the release of a new Steam Deck last year, the Steam Deck OLED. What you may not know is that Igalia helped bring this product to life by putting some effort into the AMD driver-specific color management properties implementation. Melissa Wen, together with Joshua Ashton (Valve), and Harry Wentland (AMD), implemented several driver-specific properties to allow Gamescope to manage color features provided by the AMD hardware to fit HDR content and improve gamers’ experience.

She has explained all features implemented in the AMD display kernel driver in two blog posts and a 2023 XDC talk:

Async Flip

André Almeida worked together with Simon Ser (SourceHut) to provide support for asynchronous page-flips in the atomic API. This feature targets users who want to present a new frame immediately, even if after missing a V-blank. This feature is particularly useful for applications with high frame rates, such as gaming.

Raspberry Pi 5

Raspberry Pi 5 was officially released on October 2023 and Igalia was ready to bring top-notch graphics support for it. Although we still can’t use the RPi 5 with the mainline kernel, it is superb to see some pieces coming upstream. Iago Toral worked on implementing all the kernel support needed for the V3D 7.1.x driver.

With the kernel patches, by the time the RPi 5 was released, it already included a fully 3.1 OpenGL ES and Vulkan 1.2 compliant driver implemented by Igalia.

GPU stats and CPU jobs for the Raspberry Pi 4/5

Apart from the release of the Raspberry Pi 5, Igalia is still working on improving the whole Raspberry Pi environment. I worked, together with José Maria “Chema” Casanova, implementing the support for GPU stats on the V3D driver. This means that RPi 4/5 users now can access the usage percentage of the GPU and they can access the statistics by process or globally.

I also worked, together with Melissa, implementing CPU jobs for the V3D driver. As the Broadcom GPU isn’t capable of performing some operations, the Vulkan driver uses the CPU to compensate for it. In order to avoid stalls in the job submission, now CPU jobs are part of the kernel and can be easily synchronized though with synchronization objects.

If you are curious about the CPU job implementation, you can check this blog post.

Other Contributions & Fixes

Sometimes we don’t contribute to a major feature in the release, however we can help improving documentation and sending fixes. André also contributed to this release by documenting the different AMD GPU reset methods, making it easier to understand by future users.

During Igalia’s efforts to improve the general users’ experience on the Steam Deck, Guilherme G. Piccoli noticed a message in the kernel log and readily provided a fix for this PCI issue.

Outside of the Steam Deck world, we can check some of Igalia’s work on the Qualcomm Adreno GPUs. Although most of our Adreno-related work is located at the user-space, Danylo Piliaiev sent a couple of kernel fixes to the msm driver, fixing some hangs and some CTS tests.

We also had contributions from our 2023 Igalia CE student, Nia Espera. Nia’s project was related to mobile Linux and she managed to write a couple of patches to the kernel in order to add support for the OnePlus 9 and OnePlus 9 Pro devices.

If you are a student interested in open-source and would like to have a first exposure to the professional world, check if we have openings for the Igalia Coding Experience. I was a CE student myself and being mentored by a Igalian was a incredible experience.

Check the complete list of Igalia’s contributions for the 6.8 release

Authored (57):

André Almeida (2)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Maíra Canal (17)

Melissa Wen (27)

Nia Espera (4)

Signed-off-by (88):

André Almeida (4)

Danylo Piliaiev (2)

Guilherme G. Piccoli (1)

Iago Toral Quiroga (4)

Jose Maria Casanova Crespo (2)

Maíra Canal (28)

Melissa Wen (43)

Nia Espera (4)

Acked-by (4):

Jose Maria Casanova Crespo (2)

Maíra Canal (1)

Melissa Wen (1)

Reviewed-by (30):

André Almeida (1)

Christian Gmeiner (1)

Iago Toral Quiroga (20)

Maíra Canal (4)

Melissa Wen (4)

Tested-by (1):

Guilherme G. Piccoli (1)

April 02, 2024 11:00 AM

March 20, 2024

Jani Hautakangas

Bringing WebKit back to Android.

It’s been quite a while since the last blog post about WPE-Android, but that doesn’t mean WPE-Android hasn’t been in development. The focus has been on stabilizing the runtime and implementing the most crucial core features to make it easier to integrate new APIs into WPEView.

Main building blocks #

WPE-Android has three main building blocks:

  • Cerbero
    • Cross-platform build aggregator that is used to build WPE WebKit and all of its dependencies to Android.
  • WPEBackend-Android
    • Implements Android specific graphics buffer support and buffer sharing between WebKit UIProcess and WebProcess.
  • WPEView
    • Allows displaying web content in activity layout using WPEWebKit.

WPE-Android high-level design

What’s new #

The list of all work completed so far would be quite long, as there have been no official releases or public announcements, with the exception of the last blog post. Since that update, most of the efforts have been focused on ‘under the hood’ improvements, including enhancing stability, adding support for some core WebKit features, and making general improvements to the development infrastructure.

Here is a list of new features worth mentioning:

  • Based on WPE WebKit 2.42.1, the project was branched for the 2.42.x series. Future work will continue in the main branch for the next release.
  • Dropped support for 32-bit platforms (x86 and armv7). Only arm64-v8a and x86_64 are supported
  • Integration to Android main loop so that WPE WebKit GLib main loop is driven by Android main loop
  • Process-Swap On Navigation aka PSON
  • Added ASharedMemory support to WebKit SharedMemory
  • Hardware-accelerated multimedia playback
  • Fullscreen support
  • Cookies management
  • ANGLE based WebGL
  • Cross process fence insertion for composition synchronization with Surface Flinger
  • WebDriver support
  • GitHub Actions build bots
  • GitHub Actions WebDriver test bots

Demos #

WPEWebkit powered web view (WPEView) #

Demo uses WPE-Android MiniBrowser sample application to show basic web page loading and touch-based scrolling, usage of a simple cookie manager to clear the page’s date usage, and finally loads the popular “Aquarium sample” to show a smooth (60FPS) WebGL animation running thanks to HW acceleration support.

WebDriver #

Demo shows how to run WebDriver test with with emulator. Detailed instructions how to run WebDriver test can be found in README.md

Test requests a website through the Selenium remote webdriver. It then replaces the default behavior of window.alert on the requested page by injecting and executing a JavaScript snippet. After loading the page, it performs a click() action on the element that calls the alert. This results in the text ‘cheese’ being displayed right below the ‘click me’ link.

Test contains three building blocks:

  • WPE WebDriver running on emulator
  • HTTP server serving test.html web page
  • Selenium python test script executing the test

What’s next #

We have ambitious plans for the coming months regarding the development of WPE Android, focusing mainly on implementing additional features, stabilization, and performance improvements.

As for implementing new features, now that the integration with the WPE WebKit runtime has reached a more robust state, it’s time to start adding more of the APIs that are still missing in WPEView compared to other webviews on Android and to enable other web-facing features supported by WebKit. This effort, along with adding support for features like HTTP/2 and the remote Web Inspector, will be a major focus.

As for stabilization and performance, having WebDriver support will be very helpful as it will enable us to identify and fix issues promptly and thus help make WPE Android more stable and feature-complete. We would also like to focus on conformance testing compared to other web views on Android, which should help us prioritize our efforts.

The broader goal for this year is to develop WPE-Android into an easy-to-use platform for third-party projects, offering a compelling alternative to other webviews on the Android platform. We extend many thanks to the NLNet Foundation, whose support for WPE Android through a grant from the NGI Zero Core fund will be instrumental in helping us achieve this goal!

Try it yourself #

WPE-Android is still considered a prototype, and it’s a bit difficult to build. However, if you want to try it out, you can follow the instructions in the README.md file. Additionally, you can use the project’s issue tracker to report problems or suggest new features.

March 20, 2024 12:00 AM

March 17, 2024

Qiuyi Zhang (Joyee)

Memory leak regression testing with V8/Node.js, part 3 - heap iteration-based testing

In the previous blog post, I described the heap snapshot trick as an “abuse” of the heap snapshot API, because heap snapshots are not designed to interact with the finalizers run in the heap

March 17, 2024 07:28 PM

Memory leak regression testing with V8/Node.js, part 1 - memory usage-based testing

Like many other relatively big piece of software, Node.js is no stranger to memory leaks, and with them, fixes and regression tests

March 17, 2024 07:24 PM

Memory leak regression testing with V8/Node.js, part 2 - finalizer-based testing

In the previous blog post, I talked about how Node.js used memory usage measurement to test against memory leaks. Sometimes that’s good enough to provide valid tests

March 17, 2024 07:20 PM

March 13, 2024

Brian Kardell

How We Fund the Web Ecosystem

How We Fund the Web Ecosystem

On Tuesday (March 12th, 2024), Robin Berjon and Eric Meyer and I organized, led and scribed a session during W3C breakouts day about how we fund the web ecosystem…

Every year the W3C has a week long giant set of in-person meetings called TPAC. A nice feature of those meetings has always been “Breakouts Day” which is a day where people can propose sessions about pretty much anything and we try to organize a schedule around the ones that seem interesting to enough people.

This year, the W3C decided to try a second Breakouts Day that is not at the same time as TPAC, and was purely online.

Over the last several years, I’ve written several pieces about different aspects of the health of the web ecosystem and led a podcast series with quite a few episodes about that. In those pieces I’ve argued that while the web ecosystem has become the infrastructure for nearly everything, our models for funding and prioritization of the last 20 years have proven not only inadequate, and problematic, but ultimately fragile and cannot last. The only questions, I’ve argued, are how soon and what happens next. Are we ready for it? (hint: no).

So, I talked to a few people and we proposed this as a topic. It was well attended. We began with a short presentation (we made a very detailed outline together, but credit goes to Robin for the great slides).

We organized the presentation into sort of 2 parts. First I presented explaining the problems and why we believe this requires our attention and action. First we have to admit that we have a problem, right? And that this is a problem that we should be concerned with… If not us, who? If not now, when?

Then I outlined that there are many possible solutions and elements of solutions that we can discuss (or try), but all of them share some common elements:

  1. We need a way to take in common money, and a way to actually encourage money into the pot.
  2. We need a way to efficiently and fairly prioritize the money in the pot toward actual work.

I highlighted that there are existing things we can already try (and are trying), and that we should really start trying more.

After this, Robin presented a bigger possible vision we tried to lay out with lots of still fuzzy areas and questions - but effectively: We create an institution which is (through one of a few possibilities) able to compel participation into a system which enforces more sustainable (and fairer) characteristics which guarantee support for the infrastructure of the web.

You can get a very good idea of what was actually presented from the detailed outline that we shared too.

But all of this was only the initial short presentation which I think was only maybe 5-10 minutes. The rest was the point of the breakout: Actual discussion.

I think it was very positive actually. The main thing that impressed me is that there was seemingly no push back or questioning at at all in the premise. We agree with the fundamentals - that as I explained in Webrise, it’s fragile from this perspective, and we need to care about it.

Rick Beyers (from Google, but not speaking for Google) mentioned what they observed in Chromium contributions and that they also had concerns about diversity, both in terms of contributions to a single engine, and multiple engines. He also mentioned that Chromium was spinning up a new collective idea (not yet announced).

Just this morning, we helped launch the Servo collective. The timing is purely coincidental. I’d also note that in part of my presentation I mentioned exploring ways that governments can incentivize and forgot to mention that there have been interesting developments in some open source funding happening this way recently, and to note that the White House recently made a statement that Future Software Should Be Memory Safe. If anyone has a good ‘in’ at the White House please make the case that if you want to know a good place to invest to be sure a lot of future stuff is memory safe, it’s probably browsers - and especially the one written in Rust :). The collective would be happy to accept the White House’s check.

There were also some interesting questions about whether Web Monetization could be related to this, or is just a wholly separate problem, about how the advertising model is exceptionally progressive, and where other investment comes from currently.

Happily minutes are available if you’re interested - and we’ll be trying to organize some immediate discussions on where we go from here through this repo which also has a rough outline of how one solution might work.

March 13, 2024 04:00 AM

March 11, 2024

Brian Kardell

The Darkening

The Darkening

Some additions to my half-light library for styling Shadow DOM.

In case you haven't followed, a number of my recent posts have been thinking about how Shadow DOM falls short, and how we can can use the tools we have today to explore potential solutions and improvements. This has led to good feedback and quick iteration on a tiny library called half-light.

This library lets page authors selectively "push" styles down into shadow roots as adopted @layers. It plugs into CSS's media queries, and allows selectivity on both ends: Which styles and which specific shadow roots. I'm not going to recap the whole concept and interface here because it would just be regurgitating the stuff that's already in the link above. What I want to write about is the latest set of improvements to half-light.

no-light

An important point "feature" of half-light is that it doesn't require web components to be built in a special way in order to achieve this. Alternative/Previous ideas required elements to subclass a special element, for example, to say "Yes, I am styleable from above". However, I don't believe that that seems practical for reasons I explained in Lovely Trees. Effectively, it is very difficult to grow iterations on that approach naturally, and there are already so many wonderful components out there which a number of people say "I wish I could use, but alas I cannot provide some basic styling". This library just lets them do exactly that: Provide styling from the outside (with some caveats below), to see what it's like to actually live with that possibility. Do they love it? Ultimately regret it?

I don't believe that this is some kind of breach of contract to allow that kind of styling from the outside in open shadow roots. The simple fact that the library has to do hardly anything shows just how easy it is, technically, for any page author to do it already today. No "new powers" have been added.

It's harder to make this case with closed shadow roots. While it is entirely possible for page authors to take control of closed shadow roots too, they have to achieve this by changing their nature and, effectively saying "sorry, no your closed shadow roots will be open roots in my page". However much I am not a fan of the closed roots, I do think that someone who made their root closed gave a pretty strong opinion that you're not supposed to touch the inside, even if you think you want to.

What I hadn't considered is that open roots could exist inside of closed roots, and with my pattern you could still have styled them the same way. That felt wrong, so I fixed that by making the whole subtree "darkened". That is, half-light can't get any light past it.

I also made a check for an attribute (or property) called darkened which can achieve this for a light DOM as well. You set it on the shadow host. Thus, if you write myElement.darkened = true or <my-element darkened> it will prevent half-light from applying to the whole subtree. That trick can be used by both custom elements and page authors directly if they find it helpful. Similarly, it's benign otherwise, so component authors can start adding it if they want to and whether page authors happen to be using half-light or not doesn't matter.

Optimizations

The very first edition of half-light was a little greedy as to processing and probably was doing a little too much work. I didn't hear anyone say that they actually experienced a problem, but it was clearly not as efficient as it could be, so, I improved that generally.

But I also built half-light to be a little resilient to where in the head it was loaded, and to conveniently to work with dev tools. That means you can go ahead and live-change the CSS its aware of that and will propagate changes down into your components. This is achieved via a MutationObserver.

Now, in practice I don't think this is likely to be much of a problem in most cases. Once the document has settled down enough to start rendering, I'm not sure how heavy head mutations are these days - but it seems pretty reasonable to be able to say "you don't have to keep observing", so I added that ability too. If you include the disable-live-half-light attribute on the script tag that you use to include half-light, it will stop monitoring.

There's a practical upshot to that as well: This means it can also disengage some book keeping which technically leaks memory (this won't practically cause you issues on your blog or something, it's really mainly if you are doing a lot of dynamic stuff in a long-lived application).

Feedback and evolution

What I've appreciated most about this effort is the feedback and iteration. It's kind of amazing to look over such a brief period of time all of the evolution and improvements toward solving a problem. I hope that more and more people find it valuable to explore and let us know how it goes. Real world experimentation and feedback is so valuable toward ultimately developing a standard solution. Thanks to everyone who has reached out, filed issues, written a blog post, or discussed it on a podcast.

If you haven't already, please leave an emoji or a comment on this github issue to help me collect sentiment toward a solution like this in a central place.

March 11, 2024 04:00 AM

March 05, 2024

José Dapena

Maintaining downstreams of Chromium: why downstream?

Chromium, the web browser open source project Google Chrome is based on, can be considered nowadays the reference implementation of the web platform. As such, it is the first choice when implementing the web platform in a software platform or product.

Why is it like this? In this blog post I am going to introduce the topic, and then review the different reasons why a downstream Chromium is used in different projects.

A series of blog posts

This is the first of a series of blog posts, where I am going through several aspects and challenges of maintaining a downstream project using Chromium as its upstream project.

They will be mostly based on the discussions in two events. First, on The Web Engines Hackfest 2023 break out session with same title that I chaired in A Coruña. Then, on my BlinkOn 18 talk in November 2023, at Sunnyvale.

Some definitions

Before starting the discussion of the different aspects, let’s clarify how I will use several terms.

Repository vs. project

I am going to refer to a repository as a version controlled storage of code strictly. Then, a project (specifically, a software project) is the community of people that share goals and some kind of organization, to maintain one or several software products.

So, a project may use several repositories for their goals.

In this discussion I will talk about Chromium, an open source project that targets the implementation of the web platform user agent, a web browser, for different platforms. As such, it uses a number of repositories (src, v8 and more).

Downstream and upstream

I will use the downstream and upstream terms, referred to the relationship of different software projects version control repositories.

If there is a software project repository (typically open source), and a new repository is created that contains all or part of the original repository, then:

  • Upstream project will be the original repository.
  • Downstream project is the new repository.

It is important to highlight that different things can happen to the downstream repository:

  • This copy could be a one time event, so the downstream repository becomes an independent fork, and there may be no interest in tracking the upstream evolution. This happens often for abandoned repositories, where a different set of people start an independent project. But there could be other reasons.
  • There is the intent to track the upstream repository changes. So the downstream repository evolves as the upstream repository does too, but with some specific differences maintained on top of the original repository.

Why using Chromium?

Nowadays, web platform is a solid alternative for providing contents to the users. It allows modern user interfaces, based on well known standards, and integrate well with local and remote services. The gap between native applications and web contents has been reduced, so it is a good alternative quite often.

But, when integrating web contents, product integrators need an implementation of the web platform. It is no surprise that Chromium is the most used, for a number of reasons:

  • It is open source, with a license that allows to adapt it for new product needs.
  • It is well maintained and up to date. Even pushing through standardization for improving it continuously.
  • It is secure, both from architecture and maintenance model point of view.
  • It provides integration points to tailor the implementation to ones needs.
  • It supports the most popular software platforms (Windows, Android, Linux, …) for integrating new products.
  • On top of the web platform itself, it provides an implementation for many of the components required to build a modern web browser.

Still, there are other good alternate choices for integrating the web, as WebKit (specially WPE for the embedded use cases), or using the system-provided web components (Android or iOS web view, …).

Though, in this blog post I will focus on the Chromium case.

Why downstreaming Chromium?

But, why do different projects need to use downstream Chromium repositories?

The main simple reason the project needs a downstream repository is adding changes that are not upstream. This can be for a variety of reasons:

  • Downstream changes that are not allowed by upstream. I.e. because they will make upstream project harder to maintain, or it will not be tested often.
  • Downstream changes that downstream project does not want to add to upstream.

Let’s see some examples of changes of both types.

Hardware and OS adaptation

This is when downstream adds support for a hardware target or OS that is not originally supported in upstream Chromium project.

Chromium upstream provides an abstraction layer for that purpose named Ozone, that allows to adapt it to the OS, desktop environment, and system graphics compositor. But there are other abstraction layers for media acceleration, accessibility or input methods.

The Wayland protocol adaptation started as a downstream effort, as upstream Chromium did not intend to support Wayland at that time. Eventually it evolved into an upstream official Ozone backend.

An example? LGE webOS Chromium port.

Differentiation

The previous case mostly forces to have a downstream project or repository. But there are also some cases where this is intended. There is the will to have some features in the downstream repository and not in upstream, an intended differentiation.

Why would anybody want that? Some typical examples:

  • A set of features that the downstream project owners consider to make the project better in some way, and want them to be kept downstream. This can happen when a new browser is shipped, and it contains features that make the product offering different, and, in some ways, better, than upstream Chrome. That can be a different user experience, some security features, better privacy…
  • Adaptation to a different product brand. Each browser or browser-based product will want to have its specific brand instead of upstream Chromium brand.

Examples of this:

  • Brave browser, with completely different privacy and security choices.
  • ARC browser, with an innovative user experience.
  • Microsoft Edge, with tight Windows OS integration and corporate features.

Hybrid application runtimes

And one last interesting case: integrating the web platform for developing hybrid applications: those that mix parts of the user interface implemented in a native toolkit, and parts implemented using the web platform.

Though Chromium includes official support for hybrid applications for Android, with the Android Web View, other toolkits provide also web applications support, and the integration of those belong, in Chromium case, to downstream projects.

Examples?

What’s next?

In this blog post I presented different reasons why projects end up maintaining a downstream fork of Chromium.

In the next blog post I will present one of the main challenges when maintaining a downstream of Chromium: the different rebase and upgrade strategies.

by José Dapena Paz at March 05, 2024 03:54 PM

February 26, 2024

Andy Wingo

on the impossibility of composing finalizers and ffi

While poking the other day at making a Guile binding for Harfbuzz, I remembered why I don’t much do this any more: it is impossible to compose GC with explicit ownership.

Allow me to illustrate with an example. Harfbuzz has a concept of blobs, which are refcounted sequences of bytes. It uses these in a number of places, for example when loading OpenType fonts. You can get a peek at the blob’s contents back with hb_blob_get_data, which gives you a pointer and a length.

Say you are in LuaJIT. (To think that for a couple years, I wrote LuaJIT all day long; now I can hardly remember.) You get a blob from somewhere and want to get its data. You define a wrapper for hb_blob_get_data:

local hb = ffi.load("harfbuzz")
ffi.cdef [[
typedef struct hb_blob_t hb_blob_t;

const char *
hb_blob_get_data (hb_blob_t *blob, unsigned int *length);
]]

Presumably you then arrange to release LuaJIT’s reference on the blob when GC collects a Lua wrapper for a blob:

ffi.cdef [[
void hb_blob_destroy (hb_blob_t *blob);
]]

function adopt_blob(ptr)
  return ffi.gc(ptr, hb.hb_blob_destroy)
end

OK, so let’s say we get a blob from somewhere, and want to copy out its contents as a byte string.

function blob_contents(blob)
   local len_out = ffi.new('unsigned int')
   local contents = hb.hb_blob_get_data(blob, len_out)
   local len = len_out[0];
   return ffi.string(contents, len)
end

The thing is, this code is as correct as you can get it, but it’s not correct enough. In between the call to hb_blob_get_data and, well, anything else, GC could run, and if blob is not used in the future of the program execution (the continuation), then it could be collected, causing the hb_blob_destroy finalizer to release the last reference on the blob, freeing contents: we would then be accessing invalid memory.

Among GC implementors, it is a truth universally acknowledged that a program containing finalizers must be in want of a segfault. The semantics of LuaJIT do not prescribe when GC can happen and what values will be live, so the GC and the compiler are not constrained to extend the liveness of blob to, say, the entirety of its lexical scope. It is perfectly valid to collect blob after its last use, and so at some point a GC will evolve to do just that.

I chose LuaJIT not to pick on it, but rather because its FFI is very straightforward. All other languages with GC that I am aware of have this same issue. There are but two work-arounds, and neither are satisfactory: either develop a deep and correct knowledge of what the compiler and run-time will do for a given piece of code, and then pray that knowledge does not go out of date, or attempt to manually extend the lifetime of a finalizable object, and then pray the compiler and GC don’t learn new tricks to invalidate your trick.

This latter strategy takes the form of “remember-this” procedures that are designed to outsmart the compiler. They have mostly worked for the last few decades, but I wouldn’t bet on them in the future.

Another way to look at the problem is that once you have a system working—though, how would you know it’s correct?—then you either never update the compiler and run-time, or you become fast friends with whoever maintains your GC, and probably your compiler too.

For more on this topic, as always Hans Boehm has the first and last word; see for example the 2002 Destructors, finalizers, and synchronization. These considerations don’t really apply to destructors, which are used in languages with ownership and generally run synchronously.

Happy hacking, and be safe out there!

by Andy Wingo at February 26, 2024 10:05 AM

February 23, 2024

Manuel Rego

Servo at FOSDEM 2024

Following the trend started on my last blog post this is a blog post about Servo presence in a new event, this time FOSDEM 2024. This was my first time at FOSDEM, which is a special and different conference with lots of people and talks; it was quite an experience!

Picture of Rakhi during her Servo talk at FOSDEM with Tauri experiment slide in the background.
Servo talk by Rakhi Sharma at FOSDEM 2024

Embedding Servo in Rust projects #

My colleague Rakhi Sharma gave a great talk about Servo in the Rust devroom. The talk went deep into how Servo can be embedded by other Rust projects, showing the work on the Servo’s Minibrowser, and the latest experiments related to the integration with Tauri through their cross-platform WebView library called wry. It’s worth to highlight that the latter has been sponsored by NLnet Foundation, big thanks for your support! Attending FOSDEM was nice as we also met Daniel Thompson-Yvetot from Tauri, which we have been working with as part of this collaboration, and discussed about improvements and plans for the future.

Slides and video of the talk are already available online. Don’t miss it if you’re interested in the more recent news around Servo development and how to embedded it in other projects.

Servo at Linux Foundation Europe stand #

Servo is a Linux Foundation Europe project, and they have a stand at FOSDEM. This allowed us to show Servo to people passing by there and explain the ongoing development efforts on the project. There were also some nice Servo swag that people really liked. The stand gave us the chance to talk to different people about Servo. Most of them were very interested in the project revival and are hopeful about a promising future for the project.

Picture of Linux Foundation Europe stand at FOSDEM 2024 with different Servo swag (stickers, flyers, buttons, cups, caps, teddy bears). Also in the picture there is a screen showing a video of Servo rendering WebGPU content.
Servo swag at Linux Foundation Europe stand at FOSDEM 2024

As part of our attendance to FOSDEM we had the chance to meet some people we have been working with recently. We met Taym Haddadi one active Servo contributor, it was nice to talk to people contributing to the project and meet them in real life. In addition, we were also talking briefly with NLnet folks, they have one of the most popular stands with stickers of all the projects they fund. Furthermore, we also have the chance to talk to Outreachy folks, as we’re working into getting Servo back into Outreachy internship program this year.

Other Igalia talks at FOSDEM 2024 #

Finally, I’d like to highlight that Igalia presence at FOSDEM was much bigger than only Servo related stuff. There were a large groups of igalians participating in the event, and we have a bunch of talks there:

February 23, 2024 12:00 AM

February 20, 2024

Carlos García Campos

A Clarification About WebKit Switching to Skia

In the previous post I talked about the plans of the WebKit ports currently using Cairo to switch to Skia for 2D rendering. Apple ports don’t use Cairo, so they won’t be switching to Skia. I understand the post title was confusing, I’m sorry about that. The original post has been updated for clarity.

by carlos garcia campos at February 20, 2024 06:11 PM

Alex Bradbury

Clarifying instruction semantics with P-Code

I've recently had a need to step through quite a bit of disassembly for different architectures, and although some architectures have well-written ISA manuals it can be a bit jarring switching between very different assembly syntaxes (like "source, destination" for AT&T vs "destination, source" for just about everything else) or tedious looking up different ISA manuals to clarify the precise semantics. I've been using a very simple script to help convert an encoded instruction to a target-independent description of its semantics, and thought I may as well share it as well as some thoughts on its limitations.

instruction_to_pcode

The script is simplicity itself, thanks to the pypcode bindings to Ghidra's SLEIGH library which provides an interface to convert an input to the P-Code representation. Articles like this one provide an introduction and there's the reference manual in the Ghidra repo but it's probably easiest to just look at a few examples. P-Code is used as the basis of Ghidra's decompiler and provides a consistent human-readable description of the semantics of instructions for supported targets.

Here's an example aarch64 instruction:

$ ./instruction_to_pcode aarch64 b874c925
-- 0x0: ldr w5, [x9, w20, SXTW #0x0]
0) unique[0x5f80:8] = sext(w20)
1) unique[0x7200:8] = unique[0x5f80:8]
2) unique[0x7200:8] = unique[0x7200:8] << 0x0
3) unique[0x7580:8] = x9 + unique[0x7200:8]
4) unique[0x28b80:4] = *[ram]unique[0x7580:8]
5) x5 = zext(unique[0x28b80:4])

In the above you can see that the disassembly for the instruction is dumped, and then 5 P-Code instructions are printed showing the semantics. These P-Code instructions directly use the register names for architectural registers (as a reminder, AArch64 has 64-bit GPRs X0-X30 with the bottom halves acessible through W-W30). Intermediate state is stored in unique[addr:width] locations. So the above instruction sign-extends w20, adds to x9, and reads a 32-bit value from the resulting address, then zero-extends to 64-bits when storing into x5.

The output is somewhat more verbose for architectures with flag registers, e.g. cmpb $0x2f,-0x1(%r11) produces:

./instruction_to_pcode x86-64 --no-reverse-input "41 80 7b ff 2f"
-- 0x0: CMP byte ptr [R11 + -0x1],0x2f
0) unique[0x3100:8] = R11 + 0xffffffffffffffff
1) unique[0xbd80:1] = *[ram]unique[0x3100:8]
2) CF = unique[0xbd80:1] < 0x2f
3) unique[0xbd80:1] = *[ram]unique[0x3100:8]
4) OF = sborrow(unique[0xbd80:1], 0x2f)
5) unique[0xbd80:1] = *[ram]unique[0x3100:8]
6) unique[0x28e00:1] = unique[0xbd80:1] - 0x2f
7) SF = unique[0x28e00:1] s< 0x0
8) ZF = unique[0x28e00:1] == 0x0
9) unique[0x13180:1] = unique[0x28e00:1] & 0xff
10) unique[0x13200:1] = popcount(unique[0x13180:1])
11) unique[0x13280:1] = unique[0x13200:1] & 0x1
12) PF = unique[0x13280:1] == 0x0

But simple instructions that don't set flags do produce concise P-Code:

$ ./instruction_to_pcode riscv64 "9d2d"
-- 0x0: c.addw a0,a1
0) unique[0x15880:4] = a0 + a1
1) a0 = sext(unique[0x15880:4])

Other approaches

P-Code was an intermediate language I'd encountered before and of course benefits from having an easy to use Python wrapper and fairly good support for a range of ISAs in Ghidra. But there are lots of other options - angr (which uses Vex, taken from Valgrind) compares some options and there's more in this paper. Radare2 has ESIL, but while I'm sure you'd get used to it, it doesn't pass the readability test for me. The rev.ng project uses QEMU's TCG. This is an attractive approach because you benefit from more testing and ISA extension support for some targets vs P-Code (Ghidra support is lacking for RVV, bitmanip, and crypto extensions).

Another route would be to pull out the semantic definitions from a formal spec (like Sail) or even an easy to read simulator (e.g. Spike for RISC-V). But in both cases, definitions are written to minimise repetition to some degree, while when expanding the semantics we prefer explicitness, so would want to expand to a form that differs a bit from the Sail/Spike code as written.

February 20, 2024 12:00 PM

February 19, 2024

Carlos García Campos

WebKitGTK and WPEWebKit Switching to Skia for 2D Graphics Rendering

In recent years we have had an ongoing effort to improve graphics performance of the WebKit GTK and WPE ports. As a result of this we shipped features like threaded rendering, the DMA-BUF renderer, or proper vertical retrace synchronization (VSync). While these improvements have helped keep WebKit competitive, and even perform better than other engines in some scenarios, it has been clear for a while that we were reaching the limits of what can be achieved with a CPU based 2D renderer.

There was an attempt at making Cairo support GPU rendering, which did not work particularly well due to the library being designed around stateful operation based upon the PostScript model—resulting in a convenient and familiar API, great output quality, but hard to retarget and with some particularly slow corner cases. Meanwhile, other web engines have moved more work to the GPU, including 2D rendering, where many operations are considerably faster.

We checked all the available 2D rendering libraries we could find, but none of them met all our requirements, so we decided to try writing our own library. At the beginning it worked really well, with impressive results in performance even compared to other GPU based alternatives. However, it proved challenging to find the right balance between performance and rendering quality, so we decided to try other alternatives before continuing with its development. Our next option had always been Skia. The main reason why we didn’t choose Skia from the beginning was that it didn’t provide a public library with API stability that distros can package and we can use like most of our dependencies. It still wasn’t what we wanted, but now we have more experience in WebKit maintaining third party dependencies inside the source tree like ANGLE and libwebrtc, so it was no longer a blocker either.

In December 2023 we made the decision of giving Skia a try internally and see if it would be worth the effort of maintaining the project as a third party module inside WebKit. In just one month we had implemented enough features to be able to run all MotionMark tests. The results in the desktop were quite impressive, getting double the score of MotionMark global result. We still had to do more tests in embedded devices which are the actual target of WPE, but it was clear that, at least in the desktop, with this very initial implementation that was not even optimized (we kept our current architecture that is optimized for CPU rendering) we got much better results. We decided that Skia was the option, so we continued working on it and doing more tests in embedded devices. In the boards that we tried we also got better results than CPU rendering, but the difference was not so big, which means that with less powerful GPUs and with our current architecture designed for CPU rendering we were not that far from CPU rendering. That’s the reason why we managed to keep WPE competitive in embeeded devices, but Skia will not only bring performance improvements, it will also simplify the code and will allow us to implement new features . So, we had enough data already to make the final decision of going with Skia.

In February 2024 we reached a point in which our Skia internal branch was in an “upstreamable” state, so there was no reason to continue working privately. We met with several teams from Google, Sony, Apple and Red Hat to discuss with them about our intention to switch from Cairo to Skia, upstreaming what we had as soon as possible. We got really positive feedback from all of them, so we sent an email to the WebKit developers mailing list to make it public. And again we only got positive feedback, so we started to prepare the patches to import Skia into WebKit, add the CMake integration and the initial Skia implementation for the WPE port that already landed in main.

We will continue working on the Skia implementation in upstream WebKit, and we also have plans to change our architecture to better support the GPU rendering case in a more efficient way. We don’t have a deadline, it will be ready when we have implemented everything currently supported by Cairo, we don’t plan to switch with regressions. We are focused on the WPE port for now, but at some point we will start working on GTK too and other ports using cairo will eventually start getting Skia support as well.

by carlos garcia campos at February 19, 2024 01:27 PM

February 16, 2024

Melissa Wen

Keep an eye out: We are preparing the 2024 Linux Display Next Hackfest!

Igalia is preparing the 2024 Linux Display Next Hackfest and we are thrilled to announce that this year’s hackfest will take place from May 14th to 16th at our HQ in A Coruña, Spain.

This unconference-style event aims to bring together the most relevant players in the Linux display community to tackle current challenges and chart the future of the display stack.

Key goals for the hackfest include:

  • Releasing the power of collaboration: We’ll work to remove bottlenecks and pave the way for smoother, more performant displays.
  • Problem-solving powerhouse: Brainstorming sessions and collaborative coding will target issues like HDR, color management, variable refresh rates, and more.
  • Building on past commitments: Let’s solidify the progress made in recent years and push the boundaries even further.

The hackfest fosters an intimate and focused environment to brainstorm, hack, and design solutions alongside fellow display experts. Participants will dive into discussions, tinker with code, and contribute to shaping the future of the Linux display stack.

More details are available on the official website.

Stay tuned! Keep an eye out for more information, mark your calendars and start prepping your hacking gear.

February 16, 2024 05:25 PM

February 14, 2024

Lucas Fryzek

A Dive into Vulkanised 2024

Vulkanised sign at google’s office
Vulkanised sign at google’s office

Last week I had an exciting opportunity to attend the Vulkanised 2024 conference. For those of you not familar with the event, it is “The Premier Vulkan Developer Conference” hosted by the Vulkan working group from Khronos. With the excitement out of the way, I decided to write about some of the interesting information that came out of the conference.

A Few Presentations

My colleagues Iago, Stéphane, and Hyunjun each had the opportunity to present on some of their work into the wider Vulkan ecosystem.

Stéphane and Hyujun presenting
Stéphane and Hyujun presenting

Stéphane & Hyunjun presented “Implementing a Vulkan Video Encoder From Mesa to Streamer”. They jointly talked about the work they performed to implement the Vulkan video extensions in Intel’s ANV Mesa driver as well as in GStreamer. This was an interesting presentation because you got to see how the new Vulkan video extensions affected both driver developers implementing the extensions and application developers making use of the extensions for real time video decoding and encoding. Their presentation is available on vulkan.org.

Iago presenting
Iago presenting

Later my colleague Iago presented jointly with Faith Ekstrand (a well-known Linux graphic stack contributor from Collabora) on “8 Years of Open Drivers, including the State of Vulkan in Mesa”. They both talked about the current state of Vulkan in the open source driver ecosystem, and some of the benefits open source drivers have been able to take advantage of, like the common Vulkan runtime code and a shared compiler stack. You can check out their presentation for all the details.

Besides Igalia’s presentations, there were several more which I found interesting, with topics such as Vulkan developer tools, experiences of using Vulkan in real work applications, and even how to teach Vulkan to new developers. Here are some highlights for some of them.

Using Vulkan Synchronization Validation Effectively

John Zulauf had a presentation of the Vulkan synchronization validation layers that he has been working on. If you are not familiar with these, then you should really check them out. They work by tracking how resources are used inside Vulkan and providing error messages with some hints if you use a resource in a way where it is not synchronized properly. It can’t catch every error, but it’s a great tool in the toolbelt of Vulkan developers to make their lives easier when it comes to debugging synchronization issues. As John said in the presentation, synchronization in Vulkan is hard, and nearly every application he tested the layers on reveled a synchronization issue, no matter how simple it was. He can proudly say he is a vkQuake contributor now because of these layers.

6 Years of Teaching Vulkan with Example for Video Extensions

This was an interesting presentation from a professor at the university of Vienna about his experience teaching graphics as well as game development to students who may have little real programming experience. He covered the techniques he uses to make learning easier as well as resources that he uses. This would be a great presentation to check out if you’re trying to teach Vulkan to others.

Vulkan Synchronization Made Easy

Another presentation focused on Vulkan sync, but instead of debugging it, Grigory showed how his graphics library abstracts sync away from the user without implementing a render graph. He presented an interesting technique that is similar to how the sync validation layers work when it comes ensuring that resources are always synchronized before use. If you’re building your own engine in Vulkan, this is definitely something worth checking out.

Vulkan Video Encode API: A Deep Dive

Tony at Nvidia did a deep dive into the new Vulkan Video extensions, explaining a bit about how video codecs work, and also including a roadmap for future codec support in the video extensions. Especially interesting for us was that he made a nice call-out to Igalia and our work on Vulkan Video CTS and open source driver support on slide (6) :)

Thoughts on Vulkanised

Vulkanised is an interesting conference that gives you the intersection of people working on Vulkan drivers, game developers using Vulkan for their graphics backend, visual FX tool developers using Vulkan-based tools in their pipeline, industrial application developers using Vulkan for some embedded commercial systems, and general hobbyists who are just interested in Vulkan. As an example of some of these interesting audience members, I got to talk with a member of the Blender foundation about his work on the Vulkan backend to Blender.

Lastly the event was held at Google’s offices in Sunnyvale. Which I’m always happy to travel to, not just for the better weather (coming from Canada), but also for the amazing restaurants and food that’s in the Bay Area!

Great bay area food
Great bay area food

February 14, 2024 05:00 AM

February 13, 2024

Víctor Jáquez

GstVA library in GStreamer 1.22 and some new features in 1.24

I know, it’s old news, but still, I was pending to write about the GstVA library to clarify its purpose and scope.I didn’t want to have a library for GstVA, because to maintain a library is a though duty. I learnt it in the bad way with GStreamer-VAAPI. Therefore, I wanted GstVA as simple as possible, direct in its libva usage, and self-contained.

As far as I know, the main usage of GStreamer-VAAPI library was to have access to the VASurface from the GstBuffer, for example, with appsink. Since the beginning of GstVA, a mechanism to access the surface ID was provided, as `GStreamer OpenGL** offers access to the texture ID: through a flag when mapping a GstGL memory backed buffer. You can see how this mechanism is used in the one of the GstVA sample apps.

Nevertheless, later another use case appeared which demanded a library for GstVA: Other GStreamer elements needed to produce or consume VA surface backed buffers. Or to say it more concretely: they needed to use the GstVA allocator and buffer pool, and the mechanism to share the GstVA context along the pipeline too. The plugins with those VA related elements are msdk and qsv, when they operate on Linux. Both elements use Intel OneVPL, so they are basically competitors. The main difference is that the first is maintained by Intel, and has more features, specially for Linux, whilst the former in maintained by Seungha Yang, and it appears to be better tested in Windows.

These are the objects exposed by the GstVA library API:

GstVaDisplay #

GstVaDisplay represents a VADisplay, the interface between the application and the hardware accelerator. This class is abstract, and it’s supposed not to be instantiated. Instantiation has to go through it derived classes, such as

This class is shared among all the elements in the pipeline via GstContext, so all the plugged elements share the same connection the hardware accelerator. Unless the plugged element in the pipeline has a specific name, as in the case of multi GPU systems.

Let’s talk a bit about multi GPU systems. This is the gst-inspect-1.0 output of a system with an Intel and an AMD GPU, both with VA support:

$ gst-inspect-1.0 va
Plugin Details:
Name va
Description VA-API codecs plugin
Filename /home/igalia/vjaquez/gstreamer/build/subprojects/gst-plugins-bad/sys/va/libgstva.so
Version 1.23.1.1
License LGPL
Source module gst-plugins-bad
Documentation https://gstreamer.freedesktop.org/documentation/va/
Binary package GStreamer Bad Plug-ins git
Origin URL Unknown package origin

vaav1dec: VA-API AV1 Decoder in Intel(R) Gen Graphics
vacompositor: VA-API Video Compositor in Intel(R) Gen Graphics
vadeinterlace: VA-API Deinterlacer in Intel(R) Gen Graphics
vah264dec: VA-API H.264 Decoder in Intel(R) Gen Graphics
vah264lpenc: VA-API H.264 Low Power Encoder in Intel(R) Gen Graphics
vah265dec: VA-API H.265 Decoder in Intel(R) Gen Graphics
vah265lpenc: VA-API H.265 Low Power Encoder in Intel(R) Gen Graphics
vajpegdec: VA-API JPEG Decoder in Intel(R) Gen Graphics
vampeg2dec: VA-API Mpeg2 Decoder in Intel(R) Gen Graphics
vapostproc: VA-API Video Postprocessor in Intel(R) Gen Graphics
varenderD129av1dec: VA-API AV1 Decoder in AMD Radeon Graphics in renderD129
varenderD129av1enc: VA-API AV1 Encoder in AMD Radeon Graphics in renderD129
varenderD129compositor: VA-API Video Compositor in AMD Radeon Graphics in renderD129
varenderD129deinterlace: VA-API Deinterlacer in AMD Radeon Graphics in renderD129
varenderD129h264dec: VA-API H.264 Decoder in AMD Radeon Graphics in renderD129
varenderD129h264enc: VA-API H.264 Encoder in AMD Radeon Graphics in renderD129
varenderD129h265dec: VA-API H.265 Decoder in AMD Radeon Graphics in renderD129
varenderD129h265enc: VA-API H.265 Encoder in AMD Radeon Graphics in renderD129
varenderD129jpegdec: VA-API JPEG Decoder in AMD Radeon Graphics in renderD129
varenderD129postproc: VA-API Video Postprocessor in AMD Radeon Graphics in renderD129
varenderD129vp9dec: VA-API VP9 Decoder in AMD Radeon Graphics in renderD129
vavp9dec: VA-API VP9 Decoder in Intel(R) Gen Graphics

22 features:
+-- 22 elements

As you can observe, each card registers the supported element. The first GPU, the Intel one, doesn’t insert renderD129 after the va prefix, while the AMD Radeon, does. As you could imagine, renderD129 is the device name in /dev/dri:

$ ls /dev/dri
by-path card0 card1 renderD128 renderD129

The appended device name expresses the DRM device the elements use. And only after the second GPU the device name is appended. This allows to use the untagged elements either for the first GPU or to allow the usage of a wrapped display injected by the user application, such as VA X11 or Wayland connections.

Notice that sharing DMABuf-based buffers between different GPUs is theoretically possible, but not assured.

Keep in mind that, currently, nvidia-vaapi-driver is not supported by GstVA, and the driver will be ignored by the plugin register since 1.24.

VA allocators #

There are two types of memory allocators in GstVA:

  • VA surface allocator. It allocates a GstMemory that wraps a VASurfaceID, which represents a complete frame directly consumable by VA-based elements. Thus, a GstBuffer will hold one and only one GstMemory of this type. These buffers are the very same used by the system memory buffers, since the surfaces generally can map their content to CPU memory (unless they are encrypted, but GstVA haven’t been tested for that use case).
  • VA-DMABuf allocator. It descends from GstDmaBufAllocator, though it’s not an allocator in strict sense, since it only imports DMABufs to VA surfaces, and exports VA surfaces to DMABufs. Notice that a single VA surface can be exported as multiple DMABuf-backed GstMemories, and the contrary, a VA surface can be imported from multiple DMABufs.

Also, VA allocators offer a couple methods that can be useful for applications that use appsink, for example gst_va_buffer_get_surface and gst_va_buffer_peek_display.

One particularity of both allocators is that they keep an internal pool of the allocated VA surfaces, since the mapping of buffers/memories is not always 1:1, as in the case of DMABuf; to reuse surfaces even if the buffer pool ditches them, and to avoid surface leaks, keeping track of them all the time.

GstVaPool #

This class is only useful for other GStreamer elements capable of consume VA surfaces, so they could instantiate, configure and propose a VA buffer pool, but not for applications.

And that’s all for now. Thank you for bear with me up to here. But remember, the API of the library is unstable, and it can change any time. So, no promises are made :)

February 13, 2024 12:00 AM

February 12, 2024

Brian Kardell

StyleSheet Parfait

StyleSheet Parfait

In this post I'll talk about some interesting things (some people might pronounce this 'footguns') around adoptedStyleSheets, conversations and thoughts around open styling problems and @layer.

If you're not familiar with adopted stylesheets, they're a way that your shadow roots can share literal styesheet instances by reference. That's a cool idea, right? If you have 10 fancy-inputs on the same page, it makes no sense for each of them to have their own copy of the whole stylesheet.

It's fairly early days for this still (in standards and support terms, at least) and we'll put improvements on top of this, but for now it is a pretty basic JavaScript API: Every shadow root now has an .adoptedStyleSheets property, which is an array. You can push stylesheets onto it or just assign an array of stylesheets. Currently those stylesheets have to be instances created via the recently introduced constructor new CSSStyleSheet().

Cool.

Steve Orvell opened an issue suggesting that I make half-light use adopted stylesheets. Sure, why not. In practice what this really saves is mainly the parse time, since browsers are pretty good at optimizing this otherwise, but that's still important and, in fact, it made the code more concise as well.

Whoopsie

However, there is an important bit about adopted stylesheets that I forgot about when I implemented this initially (which is strange because I am on the record discussing it in CSSWG): adopted stylesheets are treated as if they come after any stylesheets in the (shadow) root.

Previously, half-light (and earlier experiments) took great care to put stylesheets from the outer page before any that the component itself. That seems right to me, and what the adopted stylesheets were doing now with adopted stylesheets seemed wrong...

Enter: Layers

A solution to this newly created problem that's fairly easy is to wrap the rules that are adopted with @layer. Then, if your component has a style element, the rules in there will, by default, win. And that's true even if the rules that the page author pushed in had higher specificity! That's a pretty nice improvement. If some code helps you understand, here's a pen that illustrates it all:

See the Pen adoptedstylesheets and layers by вкαя∂εℓℓ (@briankardell) on CodePen.

Layers... Like an ogre.

Eric and I recently did a podcast on the Open Styleable shadow roots topic with Mia. Some of the thoughts she shared, and later conversations that followed with Westbrook and Nolan, convinced me to try to explore how we could use layers in half-light.

It had me thinking that the way that adopted stylesheets and layers both work seems to allow that we could develop some kind of 'shadow styling protocol' here. Maybe the simplest way to do this is to just give the layer a well-known name: I called it --crossroot

Now, when a page author uses half-light to set a style like:

@media --crossroot { 
  h1 { ... }
}

This is adopted into shadow roots as:

@layer --crossroot { 
  h1 { ... }
}

That is a lower layer than anything in the default layer of a component's shadow root (both stylesheets or adopted stylesheets).

The maybe interesting part this adds is that it means that the component itself can consciously manage it's layers if it chooses to do so! For example..

this.shadowRoot.innerHTML = `
  <style>
    @layer base, --crossroot, main;
    ... 
    /* add rules to those layers  */
    ... 
  </style>
  ${some_html}
`

If that sounds a little confusing, it's really not too bad - what it means is that, going from least to most specific, rules would evaluate roughly like:

  1. User Agent styles.
  2. Page authored rules that inherit into the Shadow DOM.
  3. @layers (including --crossroot half-light provides) in the shadow
  4. Rules in style elements in the shadow that aren't in a layer.

Combining ideas...

There was also some unrelated feedback from Nolan that this approach was a non-starter for some use cases - like pushing Bootstrap down to all of the shadow roots. The previous shadow-boxing library would have supported that better. Luckily, though we've built this on Media Queries, so CSS has that pretty well figured out - all we have to do is add support to half-light to make Media Queries work in link and style tags (in the head) as well. Easy enough, and I agree that's a good improvement. So, now you can also write markup like this:

<link rel="stylesheet" href="../prism.css" media="screen, --crossroot"></link>

<!-- or to target shadows of specific elements, add a selector... -->
<link rel="stylesheet" href="../prism.css" media="screen, (--crossroot x-foo)"></link>

So... That's it, this is all in half-light now... And guess what? It's still only 95 lines of code. Thanks for all the feedback so far! So, wdyt? Don't forget to leave me an indication of how you're feeling about it with an emoji (and/or comment) on the Emoji sentiment or short comment issue.

Very special thanks to everyone who has commented and shared thoughts constructively along the way, even when they might not agree. If you actually voted in the emoji sentiment poll: ❤. Thanks especially to Mia who has been great to discuss/review and improve ideas with.

February 12, 2024 05:00 AM

February 05, 2024

Eric Meyer

Bookmarklet: Load All GitHub Comments

What happened was, Brian and I were chatting about W3C GitHub issues and Brian mentioned how really long issues are annoying to search and read, because GitHub has this thing where if there are too many comments on an issue, it snips out the middle with a “Load more…” button that’s very tastefully designed and pretty easy to miss if you’re quick-scrolling to try to catch up.  The squiggle-line would be a good marker, if it weren’t so tasteful as to blend into the background in a way that makes the Baby WCAG cry.

And what’s worse, from this perspective, is that if the issue has been discussed to a very particular kind of death, the “Load more…” button can have more “Load more…” buttons hiding within.  So even if you know there was an interesting comment, and you remember a word or two of it, page-searching in your browser will do no good if the comment in question is buried one or more XMLHTTPRequest calls deep.

“I really wish GitHub had an ‘expand all comments’ button at the top or something,” Brian said (or words to that effect).

Well, it was a Friday afternoon and I was feeling code-hacky, so I wrote a bookmarklet.  Here it is in easy-to-save hyperlink form:

0){setTimeout(start,5000)}}setTimeout(start,500);void(20240130);">0){setTimeout(start,5000)}}setTimeout(start,500);void(20240130);">GitHub issue loader

It waits half a second after you activate it to find all the buttons on the page (in my test runs, usually six hundred of them).  Then it looks through all the buttons to find the ones that have a textContent of “Load more…” and dispatches a click event to each one.  With that done, it waits five seconds and does it all again, waits five seconds to do it again, and so on.  Once it finds there are zero buttons with the “Load more…” textContent, it exits.  And, if five seconds is too quick due to slow loading times, you can always invoke the bookmarklet again should you come across a “Load more…” button.

If you want this ability for yourself, just drag the link above into your bookmark toolbar or bookmarks menu, and whenever you load up a mega-thread GitHub issue, fire the bookmarklet to load all the comments.  I imagine there may be cleaner ways to do this, but I was able to codeslam this in about 15 minutes using ViolentMonkey on live GitHub pages, and it does the thing.

I did consider complexifying the ViolentMonkey script so that any GitHub page is scanned for the “Load more…” button, and if one is present, then a “Load all comments” button is plopped into the top of the page, but I knew that would take at least another 15 minutes and my codeslam window was closing.  Also, it would require anyone using it to run ViolentMonkey (or equivalent) all the time, whereas the bookmarlet has zero impact unless the user invokes it.  If you want to extend this into something more than it is and share your solution with the world, by all means feel free.

The point of all this being, if you too wish GitHub had an easy way to load all the comments without you having to search for the “Load more…” button yourself, now there’s a bookmarklet made just for you.  Enjoy!

#ghload {border: 1px solid; padding: 0.75em; background: #FED3; border-radius: 0.75em; display: block; margin-inline: auto; max-width: max-content; margin-block: 1.5em; box-shadow: 0.25em 0.33em 0.67em #0003; text-indent: 0;}

Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at February 05, 2024 02:49 PM

January 29, 2024

Stephen Chenney

The CSS Highlight Inheritance Model

The CSS highlight inheritance model describes the process for inheriting the CSSproperties of the various highlight pseudo elements:

  • ::selection controlling the appearance of selected content
  • ::spelling-error controlling the appearance of misspelled word markers
  • ::grammar-error controlling how grammar errors are marked
  • ::target-text controlling the appearance of the string matching a target-text URL
  • ::highlight defining the appearance of a named highlight, accessed via the CSS highlight API

The inheritance model described here was proposed and agreed upon in 2022, and is part of the CSS Pseudo-Elements Module Level 4 Working Draft Specification, but ::selection was implemented and widely used long before this, with browsers differing in how they implemented inheritance for selection styles. When this model is implemented and released to users, the inheritance of CSS properties for the existing pseudo elements, particularly ::selection will change in a way that may break sites.

The model is implemented behind a flag in Chrome 118 and higher, as is expected to be enabled by default as early as Chrome 123 in February 2024. To see the effects right now, enable “Experimental Web Platform features” via chrome://flags.

::selection, the Original highlight Pseudo #

Historically, ::selection, or even -moz-selection, was the only pseudo element available to control the highlighting of text, in this case the appearance of selected content. Sites use ::selection when they want to modify the default browser selection painting, most often adjusting the color to improve contrast or convey additional meaning to users. Sites also use the feature to work around selection painting problems in browsers, with the style .broken::selection { background-color: transparent; } and a .broken class on elements with broken selection painting. We’ll use this last situation as the example as we demonstrate the changes to property inheritance.

The ::selection pseudo element as implemented in most browsers uses originating element inheritance, meaning that the selection inherited properties from the style of the nearest element to which the selection is being applied. The inheritance behavior of ::selection has never been fully compatible among browsers, and has failed to comply with the CSS specification even as the spec has been updated over time. As a result, workarounds are required when a page-wide selection style is inadequate, as demonstrated with this example (works when highlight inheritance is not enabled):

<style>
::selection /* = *::selection (universal) */ {
background-color: lightgreen;
}
.broken::selection {
background-color: transparent;
}
</style>
<p>Some <em>not broken</em> text</p>
<p class="broken">Some <em>broken</em> text</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

The intent of the page author is probably to have everything inside .broken be invisible for selection, but that does not happen. The problem here is that ::selection applies to all elements, including the <em> element. While the .broken element uses its own ::selection, that only applies to the element itself when selected, not its descendants.

You could always add more specific ::selection selectors, such as .broken >em::selection, but that is brittle to DOM changes and leads to style bloat.

You could also use a CSS custom property for the page-wide selection color, and just make it transparent for the broken elements, like this (works when highlight inheritance is not enabled):

<style>
:root {
--selection-color: lightgrey;
}
::selection {
background-color: var(--selection-color);
}
.broken {
--selection-color: transparent;
}
</style>
<p>Some <em>not broken</em> text</p>
<p class="broken">Some <em>broken</em> text</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

In this case, we define a page-wide selection color as a custom property referenced in a ::selection rule that matches all elements. The more specific .broken rule re-defines the custom property to make the selection transparent. With originating inheritance, the ::selection for the word “broken” inherits the custom property value from the <em> element (it’s originating element), which in turn inherits the custom property value from the .broken element. This custom property workaround is how sites such as GitHub have historically worked around the problems with originating inheritance for ::selection.

New highlighting Features Change the Trade-Offs #

The question facing the CSS Working Group, the people who set the standards for CSS, was whether to make the specification match the implementations using originating inheritance, or require browsers to change to use the specified highlight inheritance model. The question sat on the back-burner for a while because web developers were not complaining very loudly (there was a workaround once custom properties were implemented) and there was uncertainty around whether the change would break sites. Some new features in CSS changed the calculus.

First came a set of additional CSS highlight pseudo elements that pushed for a resolution to the question of how they should inherit. Given that the new pseudos were to be implemented according to the spec, while ::selection was not, there was the possibility for ::selection inheritance behavior to be inconsistent with that of the new highlight pseudos:

  • ::target-text for controlling how URL text-fragments targets are rendered
  • ::spelling-error for modifying the appearance of browser detected spelling errors
  • ::grammar-error for modifying browser detected grammar errors
  • ::highlight for defining persistent highlights on a page

In addition, the set of properties allowed in ::selection and other highlight pseudos was expanded to allow for text decorations and control of fill and stroke colors on text, and maybe text-shadow. More properties makes the CSS custom property workaround for originating inheritance unwieldy in practice because it expands the set of variables needed.

The decision was made to change browsers to match the spec, thus changing the way that ::selection inheritance works in browsers. ::selection was the only widely used highlight pseudo and authors expected it to use originating inheritance, so any change to the inheritance behavior was likely to cause problems for some sites.

What is highlight inheritance? #

The CSS highlight inheritance model starts with defining a highlight inheritance tree for each type of highlight pseudo. These trees are parallel to the DOM element tree with the same branching stucture, except the nodes in the tree now represent the highlight styles for each element. The highlight pseudos inherit their styles through the ancestor chain in their highlight inheritance tree, with body::selection inheriting from html::selection and so on.

Under this model, the original example behaves the same because the ::selection rule still matches all elements:

<style>
::selection {
background-color: lightgreen;
}
.broken::selection {
background-color: transparent;
}
</style>
<p>Some <em>not broken</em> text</p>
<p class="broken">Some <em>broken</em> text</p>

However, with highlight inheritance is is recommended that page-wide selection styles be set in :root::selection or body::selection.

<style>
:root::selection {
background-color: lightgreen;
}
.broken::selection {
background-color: transparent;
}
</style>
<p>Some <em>not broken</em> text</p>
<p class="broken">Some <em>broken</em> text</p>
<script>
range = new Range();
range.setStart(document.body, 0);
range.setEnd(document.body, 3)
document.getSelection().addRange(range);
</script>

The :root::selection is at the root of the highlight inheritance tree, because it is the selection pseudo for the document root element. All selection pseudos in the document now inherit from their parent in the highlight tree, and do not need to match a universal ::selection rule. Furthermore, a change in ::selection matching a particular element will also be inherited by all its descendants.

In this example, there are no rules matching the <em> elements, so they look at their parent selection pseudo. The “not broken” text will be highlighted in lightgreen inherited from the root via it’s parent chain. The “broken” text receives a transparent background inherited from its parent <p> that matches its own rule.

The Custom Property Workaround Now Fails #

The most significant breakage caused by the change to highlight inheritance is due to sites employing the custom property workaround for originating inheritance. Consider our former example with that workaround:

<style>
:root {
--selection-color: lightgrey;
}
:root::selection {
background-color: var(--selection-color);
}
.broken {
--selection-color: transparent;
}
</style>
<p>Some <em>not broken</em> text</p>
<p class="broken">Some <em>broken</em> text</p>

According to the original specification, and the initial implementation in chromium, the default selection color is used everywhere in this example because the custom --selection-color: lightgrey; property is defined on :root which is the root of the DOM tree but not the highlight tree. The latter is rooted at :root::selection which does not define the property.

Many sites use :root as the location for CSS custom properties. Upvoted answers on Stack Overflow explicitly recommend doing so for selection colors. Unsurprisingly, many sites broke when chromium initially enabled highlight inheritance, including GitHub. To fix this, the specification was changed to require that :root::selection inherit custom properties from :root.

But all is not well. The <em> element’s ::selection has its background set to the custom property value, which is now inherited from :root via :root::selection. But because the inheritance chain does not involve .broken at all, the custom property is evaluated against the :root value. To make this work, the custom property must be overridden in a .broken::selection rule:

  :root {
--selection-color: lightgrey;
}
:root::selection {
background-color: var(--selection-color);
}
.broken::selection {
--selection-color: transparent;
background-color: var(--selection-color);
}

Note that there’s really no point using custom properties overrides for highlights, unless you desire different ::selection styles in different parts of the document and want them to use a common set of property values that you would define on the root.

The Current Situation (at the time of writing) #

While writing this, highlight inheritance is used for all highlight pseudos except ::selection and ::target-text in chromium-based browsers (other browsers are still implementing the additional pseudo elements). Highlight inheritance is also enabled for ::selection and ::target-text for users that have “Experimental Web Platform features” enabled in chrome::flags in Chrome.

Chrome is trying to enable highlight inheritance for ::selection. One attempt to enable for all users was rolled back, and another attempt will be made in an upcoming release. The current plan is to iterate until the change sticks. That is, until site authors notice that things have changed and make suitable site changes.

If inertia is too strong, additional changes to the spec may be needed to support, in particular, the continued use of the existing custom property workaround.

Thanks #

The implementation of highlight inheritance in chromium was undertaken by Igalia S.L. funded by Bloomberg L.P. Delan Azabani did most of the work, with contributions from myself.

January 29, 2024 12:00 AM

January 26, 2024

Stephen Chenney

CSS Spelling and Grammar Styling

CSS Spelling and Grammar features enable sites to customize the appearance of the markers that browsers render when a spelling or grammar error is detected, and to control the display of spelling and grammar markers that the site itself defines.

  • The text decoration spelling-error and grammar-error line types render native spelling or grammar markers controlled by CSS. The primary use case is for site-specific spell-check or grammar systems, allowing sites to render native looking error markers for the errors the site itself detects, rather than errors the browser detects. They are part of the CSS Text Decoration Module Level 4 Working Draft Specification.
  • The ::spelling-error and ::grammar-error pseudo elements allow sites to apply custom styling to the browser rendered spelling and grammar error markers. The target use case enables sites to override the native markers should custom markers offer a better user experience on the site. They are part of the CSS Pseudo-Elements Module Level 4 Working Draft Specification.

These features are available for all users in Chrome 121 and upward, and corresponding Edge versions. You can also enable “Experimental Web Platform features” with chrome://flags in earlier versions. The features should be coming soon to Safari and Firefox.

CSS Text Decorations for Spelling and Grammar Errors #

Imagine a site that uses a specialized dictionary for detecting spelling errors in user content. Maybe users are composing highly domain specific content, like botany plant names, or working in a language not well represented by existing spelling checkers. To provide an experience familiar to users, the site might wish to use the native spelling and grammar error markers with errors detected by the site itself. Before the error line types for CSS Text Decoration were available, only the browser could decide what received native markers.

Here’s an example of how you might control the rendering of spelling markers.

<style>
.spelling-error {
text-decoration-line: spelling-error;
}
</style>
<div id="content">Some content that has a misspeled word in it</div>
<script>
const range = document.createRange();
range.setStart(content.firstChild, 24);
range.setEnd(content.firstChild, 33);
const newSpan = document.createElement("span");
newSpan.classList.add("spelling-error");
range.surroundContents(newSpan);
</script>

How it works:

  • Create a style for the spelling errors. In this case a spelling error will have a text decoration added with the line type for native spelling errors.
  • As the user edits contents, JS can track the text, process it for spelling errors, and then trigger a method to mark the errors.
  • The script above is one way to apply the style to the error, assuming you have the start and end offset of the error within its Text node. A range is created with the given offset, and then the contents of the range are pulled out of the Text and put within a span, that is then re-inserted where the range once was. The span gets the spelling-error style.

The result will vary across browser and platform, depending on the native conventions for marking errors.

Using other text-decoration properties or, rather, not #

All other text decoration properties are ignored when text-decoration-line: spelling-error or grammar-error are used. The native markers rendered for these decorations are not, in general, styleable, so there would be no clear way to apply additional properties (such as text-decoration-color or text-decoration-thickness).

Spelling and Grammar Highlight Pseudos #

The ::spelling-error and ::grammar-error CSS pseudo classes allow sites to customize the markers rendered by the browser for browser-detected errors. The native markers are not rendered, being replaced by whatever styling is defined by the pseudo class applied to the text content for the error.

For the first example, let’s define a red background tint to replace the native spelling markers:

<style>
::spelling-error {
background-color: rgba(255,0,0,0.5);
text-decoration-line: none;
}
</style>
<textarea id="content">Some content that has a misspeled word in it</textarea>
<script>
content.focus();
</script>

Two things to note:

  • The example defines the root spelling error style, applying to all elements. In general, sites should use a single style for spelling errors to avoid user confusion.
  • The text-decoration-line property is set to none to suppress the native spelling error marker. Without this, the style would inherit the text-decoration-line property from the user agent, which defines text-decoration-line: spelling-error.

A limited range of properties may be used in ::spelling-error and ::grammar-error; see the spec for details. The limits address two potential concerns:

  • The spelling markers must not affect layout of text in any way, otherwise the page contents would shift when a spelling error was detected.
  • The spelling markers must not be too expensive to render; see Privacy Concerns below.

The supported properties include text decorations. Here’s another example:

<style>
::spelling-error {
text-decoration: 2px wavy blue underline;
}
textarea {
width: 200px;
height: 200px;
}
</style>
<textarea id="content">Some content that has a misspeled word in it</textarea>
<script>
content.focus();
</script>

One consequence of user agent styling is that you must provide a line type for any custom text decoration. Without one, the highlight will inherit the native spelling error line type. As discussed above, all other properties are ignored with that line type. For example, this renders the native spelling error, not the defined one:

<style>
::spelling-error {
text-decoration-style: solid;
text-decoration-color: green;
}
</style>
<textarea id="content">Some content that has a misspeled word in it</textarea>
<script>
content.focus();
</script>

Accessibility Concerns #

Modifying the appearance of user agent features, such as spelling and grammar errors, has significant potential to hurt users through poor contrast, small details, or other accessibility problems. Always maintain good contrast with all the backgrounds present on the page. Ensure that error markers are unambiguous. Include spelling and grammar errors in your accessibility testing.

CSS Spelling and Grammar styling is your friend when the native error markers pose accessibility problems, such as poor contrast. It’s arguably the strongest motivation for supplying the features in the first place.

Privacy Concerns #

User dictionaries contain names and other personal information, so steps have been taken to ensure that sites cannot observe the presence or absence of spelling or grammar errors. Computed style queries in Javascript will report the presence of custom spelling and grammar styles regardless of whether or not they are currently rendered for an element.

There is some potential for timing attacks that render potential spelling errors and determine paint timing differences. This is mitigated in two ways: browsers do not report accurate timing data, and the time to render error markers is negligible in the overal workload of rendering the page. The set of allowed properties for spelling and grammar pseudos is also limited to avoid the possibility of very expensive rendering operations.

Thanks #

The implementation of CSS Spelling and Grammar Features was undertaken by Igalia S.L. funded by Bloomberg L.P.

January 26, 2024 12:00 AM

January 25, 2024

Brian Kardell

half-light

half-light

Evolving ideas on "open stylable" issues, a new proposal.

Recently I wrote a piece called Lovely Trees in which I described how the Shadow DOM puts a lot of people off because the use case it's designed around thus far aren't the one most authors feel like they have. That's a real shame because there is a lot of usefulness there that is just lost. People seem to want something just a little less shadowy.

However, there are several slightly different visions for what it is we want, and how to achieve it. Each offers significantly different details and implications.

In that piece I also noted that we can make any of these possible through script in order to explore the space, gain practical experience with and iterate on until we at least have a pretty good idea what will work really well.

To this end I offered a library that I called "shadow-boxing" with several possible "modes" authors could explore.

"Feels like the wrong place"

Many people seemed to think this would be way better expressed in CSS itself somehow, rather than metadata in your HTML.

There is a long history here. Originally there were combinators for crossing the shadow boundary, but these were problematic and removed.

However, the fact that this kept coming up in different ways made me continue to discuss and bounce possible ideas around. I made a few rough takes, shared them with some people, and thanks to Mia and Dave Rupert for some good comments and discussion, today I'm adding a separate library which I think will make people much happier.

Take 2: half-light.js

half-light.js is a very tiny (~100 LoC) library that lets you express styles for shadow doms in your page in your CSS (it should be CSS inline or linked in the head). You can specify whether those rules apply to both your page and shadow roots, or just shadow roots, or just certain shadow roots, etc. Let's have a look.. All of these make use of CSS @media rules containing a custom --crossroot which can be functional. The easiest way to understand it is with code, let's have a look...

Rules for shadows, not the page...

This applies to <h1>'s in all shadow roots, but not in the light DOM of the page itself.

@media --crossroot { 
  h1 { ... }
}

Authors can also provide a selector filter to specify which elements should have their shadow roots affected. This can be, for example, a tag list. In the example below, it will style the <h2>'s in the shadows of <x-foo> or <x-bar> elements, and not to those in the light DOM of your page itself...

@media --crossroot(x-foo, x-bar) { 
  h2 { ... }
}

Most selectors should work there, so you could also exclude if you prefer. The example below will style the <h3>'s in the shadows of all elements except those of <x-bat> elements ...

@media --crossroot(:not(x-bat)) {
  h3 { ... } 
}

Rules for shadows, and the page...

It's really just a trick of @media that we're tapping into: Begin any of the examples above with screen, and put the whole --crossroot in parenthesis. The example below styles all the <h1>'s in both your light DOM and all shadows...

@media screen, (--crossroot) { 
  h1 { ... }
}

Or, to use the exclusion route from above, but to apply to all <h3>'s in the page, or of shadows of all elements except those of <x-bat> elements ...

@media screen, (--crossroot(:not(x-bat))) {
  h3 { ... } 
}

Play with it... There's a pen below. Once you've got an impression, give me your impression, even with a simple Emoji sentiment or short comment here.

See the Pen halflight by вкαя∂εℓℓ (@briankardell) on CodePen.

Is this a proposal?

No, not yet, it's just a library that should allow us to experiment with something "close enough" to what "features" a real proposal might need to support, and very vaguely what it might look like.

A real proposal, if it came from this, would certainly not use this syntax, which is simply trying to strike a balance between being totally valid and easy to process, and "close enough" for us to get an idea if it's got the right moving parts.

January 25, 2024 05:00 AM

January 22, 2024

Samuel Iglesias

XDC 2023: Behind the curtains

Time flies! Back in October, Igalia organized X.Org Developers Conference 2023 in A Coruña, Spain.

In case you don’t know it, X.Org Developers Conference, despite the X.Org in the name, is a conference for all developers working in the open-source graphics stack: anything related to DRM/KMS, Mesa, X11 and Wayland compositors, etc.

A Coruña's Orzán beach

This year, I participated in the organization of XDC in A Coruña, Spain (again!) by taking care of different aspects: from logistics in the venue (Palexco) to running it in person. It was a very tiring but fulfilling experience.

Sponsors

First of all, I would like to thank all the sponsors for their support, as without them, this conference wouldn’t happen:

XDC 2023 sponsors

They didn’t only give economic support to the conference: Igalia sponsored the welcome event and lunches; X.Org Foundation sponsored coffee breaks; Tourism Office of A Coruña sponsored the guided tour in the city center; and Raspberry Pi sent Raspberry Pi 5 boards to all speakers!

XDC 2023 Stats

XDC 2023 was a success on attendance and talks submissions. Here you have some stats:

  • 📈 160 registered attendees.
  • 👬 120 attendees picked their badge in person.
  • 💻 25 attendees registered as virtual.
  • 📺 More than 6,000 views on live stream.
  • 📝 55 talks/workshops/demos distributed in three days of conference..
  • 🧗‍♀️ There were 3 social events: welcome event, city center guide tour, and one unofficial climbing activity!

XDC 2023 welcome event

Was XDC 2023 perfect organization-wise? Of course… no! Like in any event, we had some issues here and there: one with the Wi-Fi network that was quickly detected and fixed; some issues with the meals and coffee breaks (food allergies mainly), we lost some seconds of audio of a talk in the on-live streaming, and other minor things. Not bad for a community-run event!

Nevertheless, I would like to thank all the staff at Palexco for their quick response and their understanding.

Talk recordings & slides

XDC 2023 talk by André Almeida

Want to see again some talks? All conference recordings were uploaded to X.Org Foundation Youtube channel.

Slides are available to download in each talk description.

Enjoy!

XDC 2024

XDC 2024 will be in North America

We cannot tell yet where is going to happen XDC 2024, other than it will be in North America… but I can tell you that this will be announced soon. Stay tuned!

Want to organize XDC 2025 or XDC 2026?

If we continue with the current cadence: 2025 would be again in Europe, and 2026 event would be in North America.

There is a list of requirements here. Nevertheless, feel free to contact me, or to the X.Org Board of Directors, in order to get first-hand experience and knowledge about what organizing XDC entails.

XDC 2023 audience

Thanks

Thanks to all volunteers, collaborators, Palexco staff, GPUL, X.Org Foundation and many other people for their hard work. Special thanks to my Igalia colleague Chema, who did an outstanding job organizing the event together with me.

Thanks for the sponsors for their extraordinary support to this conference.

Thanks to Igalia not only for sponsoring the event, but also for all the support I got during the past year. I am glad to be part of this company, and I am always surprised by how great my colleagues are.

And last, but not least, thanks to all speakers and attendees. Without you, the conference won’t exist.

See you at XDC 2024!

January 22, 2024 09:06 AM

January 11, 2024

Andy Wingo

micro macro story time

Today, a tiny tale: about 15 years ago I was working on Guile’s macro expander. Guile inherited this code from an early version of Kent Dybvig’s portable syntax expander. It was... not easy to work with.

Some difficulties were essential. Scope is tricky, after all.

Some difficulties were incidental, but deep. The expander is ultimately a function that translates Scheme-with-macros to Scheme-without-macros. However, it is itself written in Scheme-with-macros, so to load it on a substrate without macros requires a pre-expanded copy of itself, whose data representations need to be compatible with any incremental change, so that you will be able to use the new expander to produce a fresh pre-expansion. This difficulty could have been avoided by incrementally bootstrapping the library. It works once you are used to it, but it’s gnarly.

But then, some difficulties were just superflously egregious. Dybvig is a totemic developer and researcher, but a generation or two removed from me, and when I was younger, it never occurred to me to just email him to ask why things were this way. (A tip to the reader: if someone is doing work you are interested in, you can just email them. Probably they write you back! If they don’t respond, it’s not you, they’re probably just busy and their inbox leaks.) Anyway in my totally speculatory reconstruction of events, when Dybvig goes to submit his algorithm for publication, he gets annoyed that “expand” doesn’t sound fancy enough. In a way it’s similar to the original SSA developers thinking that “phony functions” wouldn’t get published.

So Dybvig calls the expansion function “χ”, because the Greek chi looks like the X in “expand”. Fine for the paper, whatever paper that might be, but then in psyntax, there are all these functions named chi and chi-lambda and all sorts of nonsense.

In early years I was often confused by these names; I wasn’t in on the pun, and I didn’t feel like I had enough responsibility for this code to think what the name should be. I finally broke down and changed all instances of “chi” to “expand” back in 2011, and never looked back.

Anyway, this is a story with a very specific moral: don’t name your functions chi.

by Andy Wingo at January 11, 2024 02:10 PM

Maíra Canal

Introducing CPU jobs to the Raspberry Pi

Igalia is always working hard to improve 3D rendering drivers of the Broadcom VideoCore GPU, found in Raspberry Pi devices. One of our most recent efforts in this sense was the implementation of CPU jobs from the Vulkan driver to the V3D kernel driver.

What are CPU jobs and why do we need them?

In the V3DV driver, there are some Vulkan commands that cannot be performed by the GPU alone, so we implement those as CPU jobs on Mesa. A CPU job is a job that requires CPU intervention to be performed. For example, in the Broadcom VideoCore GPUs, we don’t have a way to calculate the timestamp. But we need the timestamp for Vulkan timestamp queries. Therefore, we need to calculate the timestamp on the CPU.

A CPU job in userspace also implies CPU stalling. Sometimes, we need to hold part of the command submission flow in order to correctly synchronize their execution. This waiting period caused the CPU to stall, thereby preventing the continuous submission of jobs to the GPU. To mitigate this issue, we decided to move CPU job mechanisms from the V3DV driver to the V3D kernel driver.

In the V3D kernel driver, we have different kinds of jobs: RENDER jobs, BIN jobs, CSD jobs, TFU jobs, and CLEAN CACHE jobs. For each of those jobs, we have a DRM scheduler instance that helps us to synchronize the jobs.

If you want to know more about the different kinds of V3D jobs, check out this November Update: Exploring V3D blogpost, where I explain more about all the V3D IOCTLs and jobs.

Jobs of the same kind are submitted, dispatched, and processed in the same order they are executed, using a standard first-in-first-out (FIFO) queue system. We can synchronize different jobs across different queues using DRM syncobjs. More about the V3D synchronization framework and user extensions can be learned in this two-part blog post from Melissa Wen.

From the kernel documentation, a DRM syncobj (synchronisation objects) are containers for stuff that helps sync up GPU commands. They’re super handy because you can use them in your own programs, share them with other programs, and even use them across different DRM drivers. Mostly, they’re used for making Vulkan fences and semaphores work.

By moving the CPU job from userspace to the kernel, we can make use of the DRM schedule queues and all the advantages it brings with it. For this, we created a new type of job in the V3D kernel driver, a CPU job, which also means creating a new DRM scheduler instance and a CPU job queue. Now, instead of stalling the submission thread waiting for the GPU to idle, we can use DRM syncobjs to synchronize both CPU and GPU jobs in a submission, providing more efficient usage of the GPU.

How did we implement the CPU jobs in the kernel driver?

After we decided to have a CPU job implementation in the kernel space, we could think about two possible implementations for this job: creating an IOCTL for each type of CPU job or using a user extension to provide a polymorphic behavior to a single CPU job IOCTL.

We have different types of CPU jobs (indirect CSD jobs, timestamp query jobs, copy query results jobs…) and each of them has a common infrastructure of allocation and synchronization but performs different operations. Therefore, we decided to go with the option to use user extensions.

On Melissa’s blogpost, she digs deep into the implementation of generic IOCTL extensions in the V3D kernel driver. But, to put it simply, instead of expanding the data struct for each IOCTL every time we need to add a new feature, we define a user extension chain instead. As we add new optional interfaces to control the IOCTL, we define a new extension struct that can be linked to the IOCTL data only when required by the user.

Therefore, we created a new IOCTL, drm_v3d_submit_cpu, which is used to submit any type of CPU job. This single IOCTL can be extended by a user extension, which allows us to reuse the common infrastructure - avoiding code repetition - and yet use the user extension ID to identify the type of job and depending on the type of job, perform a certain operation.

struct drm_v3d_submit_cpu {
        /* Pointer to a u32 array of the BOs that are referenced by the job.
         *
         * For DRM_V3D_EXT_ID_CPU_INDIRECT_CSD, it must contain only one BO,
         * that contains the workgroup counts.
         *
         * For DRM_V3D_EXT_ID_TIMESTAMP_QUERY, it must contain only one BO,
         * that will contain the timestamp.
         *
         * For DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY, it must contain only
         * one BO, that contains the timestamp.
         *
         * For DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY, it must contain two
         * BOs. The first is the BO where the timestamp queries will be written
         * to. The second is the BO that contains the timestamp.
         *
         * For DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY, it must contain no
         * BOs.
         *
         * For DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY, it must contain one
         * BO, where the performance queries will be written.
         */
        __u64 bo_handles;

        /* Number of BO handles passed in (size is that times 4). */
        __u32 bo_handle_count;

        __u32 flags;

        /* Pointer to an array of ioctl extensions*/
        __u64 extensions;
};

Now, we can create a CPU job and submit it with a CPU job user extension.

And which extensions are available?

  1. DRM_V3D_EXT_ID_CPU_INDIRECT_CSD: this CPU job allows us to submit an indirect CSD job. An indirect CSD job is a job that, when executed in the queue, will map an indirect buffer, read the dispatch parameters, and submit a regular dispatch. This CPU job is used in Vulkan calls like vkCmdDispatchIndirect().
  2. DRM_V3D_EXT_ID_CPU_TIMESTAMP_QUERY: this CPU job calculates the query timestamp and updates the query availability by signaling a syncobj. This CPU job is used in Vulkan calls like vkCmdWriteTimestamp().
  3. DRM_V3D_EXT_ID_CPU_RESET_TIMESTAMP_QUERY: this CPU job resets the timestamp queries based on the value offset of the first query. This CPU job is used in Vulkan calls like vkCmdResetQueryPool() for timestamp queries.
  4. DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY: this CPU job copies the complete or partial result of a query to a buffer. This CPU job is used in Vulkan calls like vkCmdCopyQueryPoolResults() for timestamp queries.
  5. DRM_V3D_EXT_ID_CPU_RESET_PERFORMANCE_QUERY: this CPU job resets the performance queries by resetting the values of the perfmons. This CPU job is used in Vulkan calls like vkCmdResetQueryPool() for performance queries.
  6. DRM_V3D_EXT_ID_CPU_COPY_PERFORMANCE_QUERY: similar to DRM_V3D_EXT_ID_CPU_COPY_TIMESTAMP_QUERY, this CPU job copies the complete or partial result of a query to a buffer. This CPU job is used in Vulkan calls like vkCmdCopyQueryPoolResults() for performance queries.

The CPU job IOCTL structure is similar to any other V3D job. We allocate the job struct, parse all the extensions, init the job, look up the BOs and lock its reservations, add the proper dependencies, and push the job to the DRM scheduler entity.

When running a CPU job, we execute the following code:

static const v3d_cpu_job_fn cpu_job_function[] = {
        [V3D_CPU_JOB_TYPE_INDIRECT_CSD] = v3d_rewrite_csd_job_wg_counts_from_indirect,
        [V3D_CPU_JOB_TYPE_TIMESTAMP_QUERY] = v3d_timestamp_query,
        [V3D_CPU_JOB_TYPE_RESET_TIMESTAMP_QUERY] = v3d_reset_timestamp_queries,
        [V3D_CPU_JOB_TYPE_COPY_TIMESTAMP_QUERY] = v3d_copy_query_results,
        [V3D_CPU_JOB_TYPE_RESET_PERFORMANCE_QUERY] = v3d_reset_performance_queries,
        [V3D_CPU_JOB_TYPE_COPY_PERFORMANCE_QUERY] = v3d_copy_performance_query,
};

static struct dma_fence *
v3d_cpu_job_run(struct drm_sched_job *sched_job)
{
        struct v3d_cpu_job *job = to_cpu_job(sched_job);
        struct v3d_dev *v3d = job->base.v3d;

        v3d->cpu_job = job;

        if (job->job_type >= ARRAY_SIZE(cpu_job_function)) {
                DRM_DEBUG_DRIVER("Unknown CPU job: %d\n", job->job_type);
                return NULL;
        }

        trace_v3d_cpu_job_begin(&v3d->drm, job->job_type);

        cpu_job_function[job->job_type](job);

        trace_v3d_cpu_job_end(&v3d->drm, job->job_type);

        return NULL;
}

The interesting thing is that each CPU job type executes a completely different operation.

The complete kernel implementation has already landed in drm-misc-next and can be seen right here.

What did we change in Mesa-V3DV to use the new kernel-V3D CPU job?

After landing the kernel implementation, I needed to accommodate the new CPU job approach in the userspace.

A fundamental rule is not to cause regressions, i.e., to keep backwards userspace compatibility with old versions of the Linux kernel. This means we cannot break new versions of Mesa running in old kernels. Therefore, we needed to create two paths: one preserving the old way to perform CPU jobs and the other using the kernel to perform CPU jobs.

So, for example, the indirect CSD job used to add two different jobs to the queue: a CPU job and a CSD job. Now, if we have the CPU job capability in the kernel, we only add a CPU job and the CSD job is dispatched from within the kernel.

-   list_addtail(&csd_job->list_link, &cmd_buffer->jobs);
+
+   /* If we have a CPU queue we submit the CPU job directly to the
+    * queue and the CSD job will be dispatched from within the kernel
+    * queue, otherwise we will have to dispatch the CSD job manually
+    * right after the CPU job by adding it to the list of jobs in the
+    * command buffer.
+    */
+   if (!cmd_buffer->device->pdevice->caps.cpu_queue)
+      list_addtail(&csd_job->list_link, &cmd_buffer->jobs);

Furthermore, now we can use syncobjs to sync the CPU jobs. For example, in the timestamp query CPU job, we used to stall the submission thread and wait for completion of all work queued before the timestamp query. Now, we can just add a barrier to the CPU job and it will be properly synchronized by the syncobjs without stalling the submission thread.

   /* The CPU job should be serialized so it only executes after all previously
    * submitted work has completed
    */
   job->serialize = V3DV_BARRIER_ALL;

We were able to test the implementation using multiple CTS tests, such as dEQP-VK.compute.pipeline.indirect_dispatch.*, dEQP-VK.pipeline.monolithic.timestamp.*, dEQP-VK.synchronization.*, dEQP-VK.query_pool.* and dEQP-VK.multiview.*.

The userspace implementation has already landed in Mesa and the full implementation can be checked in this MR.


More about the on-going challenges in the Raspberry Pi driver stack can be checked during this XDC 2023 talk presented by Iago Toral, Juan Suárez and myself. During this talk, Iago mentioned the CPU job work that we have been doing.

Also I cannot finish this post without thanking Melissa Wen and Iago Toral for all the help while developing the CPU jobs for the V3D kernel driver.

January 11, 2024 01:30 PM

January 08, 2024

Andy Wingo

missing the point of webassembly

I find most descriptions of WebAssembly to be uninspiring: if you start with a phrase like “assembly-like language” or a “virtual machine”, we have already lost the plot. That’s not to say that these descriptions are incorrect, but it’s like explaining what a dog is by starting with its circulatory system. You’re not wrong, but you should probably lead with the bark.

I have a different preferred starting point which is less descriptive but more operational: WebAssembly is a new fundamental abstraction boundary. WebAssembly is a new way of dividing computing systems into pieces and of composing systems from parts.

This all may sound high-falutin´, but it’s for real: this is the actually interesting thing about Wasm.

fundamental & abstract

It’s probably easiest to explain what I mean by example. Consider the Linux ABI: Linux doesn’t care what code it’s running; Linux just handles system calls and schedules process time. Programs that run against the x86-64 Linux ABI don’t care whether they are in a container or a virtual machine or “bare metal” or whether the processor is AMD or Intel or even a Mac M3 with Docker and Rosetta 2. The Linux ABI interface is fundamental in the sense that either side can implement any logic, subject to the restrictions of the interface, and abstract in the sense that the universe of possible behaviors has been simplified to a limited language, in this case that of system calls.

Or take HTTP: when you visit wingolog.org, you don’t have to know (but surely would be delighted to learn) that it’s Scheme code that handles the request. I don’t have to care if the other side of the line is curl or Firefox or Wolvic. HTTP is such a successful fundamental abstraction boundary that at this point it is the default for network endpoints; whether you are a database or a golang microservice, if you don’t know that you need a custom protocol, you use HTTP.

Or, to rotate our metaphorical compound microscope to high-power magnification, consider the SYS-V amd64 C ABI: almost every programming language supports some form of extern C {} to access external libraries, and the best language implementations can produce artifacts that implement the C ABI as well. The standard C ABI splits programs into parts, and allows works from separate teams to be composed into a whole. Indeed, one litmus test of a fundamental abstraction boundary is, could I reasonably define an interface and have an implementation of it be in Scheme or OCaml or what-not: if the answer is yes, we are in business.

It is in this sense that WebAssembly is a new fundamental abstraction boundary.

WebAssembly shares many of the concrete characteristics of other abstractions. Like the Linux syscall interface, WebAssembly defines an interface language in which programs rely on host capabilities to access system features. Like the C ABI, calling into WebAssembly code has a predictable low cost. Like HTTP, you can arrange for WebAssembly code to have no shared state with its host, by construction.

But WebAssembly is a new point in this space. Unlike the Linux ABI, there is no fixed set of syscalls: WebAssembly imports are named, typed, and without pre-defined meaning, more like the C ABI. Unlike the C ABI, WebAssembly modules have only the shared state that they are given; neither side has a license to access all of the memory in the “process”. And unlike HTTP, WebAssembly modules are “in the room” with their hosts: close enough that hosts can allow themselves the luxury of synchronous function calls, and to allow WebAssembly modules to synchronously call back into their hosts.

applied teleology

At this point, you are probably nodding along, but also asking yourself, what is it for? If you arrive at this question from the “WebAssembly is a virtual machine” perspective, I don’t think you’re well-equipped to answer. But starting as we did by the interface, I think we are better positioned to appreciate how WebAssembly fits into the computing landscape: the narrative is generative, in that you can explore potential niches by identifying existing abstraction boundaries.

Again, let’s take a few examples. Say you ship some “smart cities” IoT device, consisting of a microcontroller that runs some non-Linux operating system. The system doesn’t have an MMU, so you don’t have hardware memory protections, but you would like to be able to enforce some invariants on the software that this device runs; and you would also like to be able to update that software over the air. WebAssembly is getting used in these environments; I wish I had a list of deployments at hand, but perhaps we can at least take this article last year from a WebAssembly IoT vendor as proof of commercial interest.

Or, say you run a function-as-a-service cloud, meaning that you run customer code in response to individual API requests. You need to limit the allowable set of behaviors from the guest code, so you choose some abstraction boundary. You could use virtual machines, but that would be quite expensive in terms of memory. You could use containers, but you would like more control over the guest code. You could have these functions written in JavaScript, but that means that your abstraction is no longer fundamental; you limit your applicability. WebAssembly fills an interesting niche here, and there are a number of products in this space, for example Fastly Compute or Fermyon Spin.

Or to go smaller, consider extensible software, like the GIMP image editor or VS Code: in the past you would use loadable plug-in modules via the C ABI, which can be quite gnarly, or you lean into a particular scripting language, which can be slow, inexpressive, and limit the set of developers that can write extensions. It’s not a silver bullet, but WebAssembly can have a role here. For example, the Harfbuzz text shaping library supports fonts with an embedded (em-behdad?) WebAssembly extension to control how strings of characters are mapped to positioned glyphs.

aside: what boundaries do

They say that good fences make good neighbors, and though I am not quite sure it is true—since my neighbor put up a fence a few months ago, our kids don’t play together any more—boundaries certainly facilitate separation of functionality. Conway’s law is sometimes applied as a descriptive observation—ha-ha, isn’t that funny, they just shipped their org chart—but this again misses the point, in that boundaries facilitate separation, but also composition: if I know that I can fearlessly allow a font to run code because I have an appropriate abstraction boundary between host application and extension, I have gained in power. I no longer need to be responsible for every part of the product, and my software can scale up to solve harder problems by composing work from multiple teams.

There is little point in using WebAssembly if you control both sides of a boundary, just as (unless you have chickens) there is little point in putting up a fence that runs through the middle of your garden. But where you want to compose work from separate teams, the boundaries imposed by WebAssembly can be a useful tool.

narrative generation

WebAssembly is enjoying a tail-wind of hype, so I think it’s fair to say that wherever you find a fundamental abstraction boundary, someone is going to try to implement it with WebAssembly.

Again, some examples: back in 2022 I speculated that someone would “compile” Docker containers to WebAssembly modules, and now that is a thing.

I think at some point someone will attempt to replace eBPF with Wasm in the Linux kernel; eBPF is just not as good a language as Wasm, and the toolchains that produce it are worse. eBPF has clunky calling-conventions about what registers are saved and spilled at call sites, a decision that can be made more efficiently for the program and architecture at hand when register-allocating WebAssembly locals. (Sometimes people lean on the provably-terminating aspect of eBPF as its virtue, but that could apply just as well to Wasm if you prohibit the loop opcode (and the tail-call instructions) at verification-time.) And why don’t people write whole device drivers in eBPF? Or rather, targetting eBPF from C or what-have-you. It’s because eBPF is just not good enough. WebAssembly is, though! Anyway I think Linux people are too chauvinistic to pick this idea up but I bet Microsoft could do it.

I was thinking today, you know, it actually makes sense to run a WebAssembly operating system, one which runs WebAssembly binaries. If the operating system includes the Wasm run-time, it can interpose itself at syscall boundaries, sometimes allowing it to avoid context switches. You could start with something like the Linux ABI, perhaps via WALI, but for a subset of guest processes that conform to particular conventions, you could build purpose-built composition that can allocate multiple WebAssembly modules to a single process, eliding inter-process context switches and data copies for streaming computations. Or, focussing on more restricted use-cases, you could make a microkernel; googling around I found this article from a couple days ago where someone is giving this a go.

wwwhat about the wwweb

But let’s go back to the web, where you are reading this. In one sense, WebAssembly is a massive web success, being deployed to literally billions of user agents. In another, it is marginal: people do not write web front-ends in WebAssembly. Partly this is because the kind of abstraction supported by linear-memory WebAssembly 1.0 isn’t a good match for the garbage-collected DOM API exposed by web browsers. As a corrolary, languages that are most happy targetting this linear-memory model (C, Rust, and so on) aren’t good for writing DOM applications either. WebAssembly is used in auxiliary modules where you want to run legacy C++ code on user devices, or to speed up a hot leaf function, but isn’t a huge success.

This will change with the recent addition of managed data types to WebAssembly, but not necessarily in the way that you might think. Like, now that it will be cheaper and more natural to pass data back and forth with JavaScript, are we likely to see Wasm/GC progressively occupying more space in web applications? For me, I doubt that progressive is the word. In the same way that you wouldn’t run a fence through the middle of your front lawn, you wouldn’t want to divide your front-end team into JavaScript and WebAssembly sub-teams. Instead I think that we will see more phase transitions, in which whole web applications switch from JavaScript to Wasm/GC, compiled from Dart or Elm or what have you. The natural fundamental abstraction boundary in a web browser is between the user agent and the site’s code, not within the site’s code itself.

conclusion

So, friends, if you are talking to a compiler engineer, by all means: keep describing WebAssembly as an virtual machine. It will keep them interested. But for everyone else, the value of WebAssembly is what it does, which is to be a different way of breaking a system into pieces. Armed with this observation, we can look at current WebAssembly uses to understand the nature of those boundaries, and to look at new boundaries to see if WebAssembly can have a niche there. Happy hacking, and may your components always compose!

by Andy Wingo at January 08, 2024 11:45 AM

January 05, 2024

Andy Wingo

scheme modules vs whole-program compilation: fight

In a recent dispatch, I explained the whole-program compilation strategy used in Whiffle and Hoot. Today’s note explores what a correct solution might look like.

being explicit

Consider a module that exports an increment-this-integer procedure. We’ll use syntax from the R6RS standard:

(library (inc)
  (export inc)
  (import (rnrs))
  (define (inc n) (+ n 1)))

If we then have a program:

(import (rnrs) (inc))
(inc 42)

Then the meaning of this program is clear: it reduces to (+ 42 1), then to 43. Fine enough. But how do we get there? How does the compiler compose the program with the modules that it uses (transitively), to produce a single output?

In Whiffle (and Hoot), the answer is, sloppily. There is a standard prelude that initially has a number of bindings from the host compiler, Guile. One of these is +, exposed under the name %+, where the % in this case is just a warning to the reader that this is a weird primitive binding. Using this primitive, the prelude defines a wrapper:

...
(define (+ x y) (%+ x y))
...

At compilation-time, Guile’s compiler recognizes %+ as special, and therefore compiles the body of + as consisting of a primitive call (primcall), in this case to the addition primitive. The Whiffle (and Hoot, and native Guile) back-ends then avoid referencing an imported binding when compiling %+, and instead produce backend-specific code: %+ disappears. Most uses of the + wrapper get inlined so %+ ends up generating code all over the program.

The prelude is lexically splatted into the compilation unit via a pre-expansion phase, so you end up with something like:

(let () ; establish lexical binding contour
  ...
  (define (+ x y) (%+ x y))
  ...
  (let () ; new nested contour
    (define (inc n) (+ n 1))
    (inc 42)))

This program will probably optimize (via partial evaluation) to just 43. (What about let and define? Well. Perhaps we’ll get to that.)

But, again here I have taken a short-cut, which is about modules. Hoot and Whiffle don’t really do modules, yet anyway. I keep telling Spritely colleagues that it’s complicated, and rightfully they keep asking why, so this article gets into it.

is it really a big letrec?

Firstly you have to ask, what is the compilation unit anyway? I mean, given a set of modules A, B, C and so on, you could choose to compile them separately, relying on the dynamic linker to compose them at run-time, or all together, letting the compiler gnaw on them all at once. Or, just A and B, and so on. One good-enough answer to this problem is library-group form, which explicitly defines a set of topologically-sorted modules that should be compiled together. In our case, to treat the (inc) module together with our example program as one compilation unit, we would have:

(library-group
  ;; start with sequence of libraries
  ;; to include in compilation unit...
  (library (inc) ...)

  ;; then the tail is the program that
  ;; might use the libraries
  (import (rnrs) (inc))
  (inc 42))

In this example, the (rnrs) base library is not part of the compilation unit. Presumably it will be linked in, either as a build step or dynamically at run-time. For Hoot we would want the whole prelude to be included, because we don’t want any run-time dependencies. Anyway hopefully this would expand out to something like the set of nested define forms inside nested let lexical contours.

And that was my instinct: somehow we are going to smash all these modules together into a big nested letrec, and the compiler will go to town. And this would work, for a “normal” programming language.

But with Scheme, there is a problem: macros. Scheme is a “programmable programming language” that allows users to extend its syntax as well as its semantics. R6RS defines a procedural syntax transformer (“macro”) facility, in which the user can define functions that run on code at compile-time (specifically, during syntax expansion). Scheme macros manage to compose lexical scope from the macro definition with the scope at the macro instantiation site, by annotating these expressions with source location and scope information, and making syntax transformers mostly preserve those annotations.

“Macros are great!”, you say: well yes, of course. But they are a problem too. Consider this incomplete library:

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      ...)) // ***

The idea is to define a version of inc, but at compile-time: a (ctinc 42) form should expand directly to 43, not a call to inc (or even +, or %+). We define syntax transformers with define-syntax instead of define. The right-hand-side of the definition ((lambda (stx) ...)) should be a procedure of one argument, which returns one value: so far so good. Or is it? How do we actually evaluate what (lambda (stx) ...) means? What should we fill in for ...? When evaluating the transformer value, what definitions are in scope? What does lambda even mean in this context?

Well... here we butt up against the phasing wars of the mid-2000s. R6RS defines a whole system to explicitly declare what bindings are available when, then carves out a huge exception to allow for so-called implicit phasing, in which the compiler figures it out on its own. In this example we imported (rnrs) for the default phase, and this is the module that defines lambda (and indeed define and define-syntax). The standard defines that (rnrs) makes its bindings available both at run-time and expansion-time (compilation-time), so lambda means what we expect that it does. Whew! Let’s just assume implicit phasing, going forward.

The operand to the syntax transformer is a syntax object: an expression annotated with source and scope information. To pick it apart, R6RS defines a pattern-matching helper, syntax-case. In our case ctinc is unary, so we can begin to flesh out the syntax transformer:

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      (syntax-case stx ()
        ((ctinc n)
         (inc n)))))) // ***

But here there’s a detail, which is that when syntax-case destructures stx to its parts, those parts themselves are syntax objects which carry the scope and source location annotations. To strip those annotations, we call the syntax->datum procedure, exported by (rnrs).

(library (ctinc)
  (import (rnrs) (inc))
  (export ctinc)
  (define-syntax ctinc
    (lambda (stx)
      (syntax-case stx ()
        ((ctinc n)
         (inc (syntax->datum #'n)))))))

And with this, voilà our program:

(library-group
  (library (inc) ...)
  (library (ctinc) ...)
  (import (rnrs) (ctinc))
  (ctinc 42))

This program should pre-expand to something like:

(let ()
  (define (inc n) (+ n 1))
  (let ()
    (define-syntax ctinc
      (lambda (stx)
        (syntax-case stx ()
          ((ctinc n)
           (inc (syntax->datum #'n))))))
    (ctinc 42)))

And then expansion should transform (ctinc 42) to 43. However, our naïve pre-expansion is not good enough for this to be possible. If you ran this in Guile you would get an error:

Syntax error:
unknown file:8:12: reference to identifier outside its scope in form inc

Which is to say, inc is not available as a value within the definition of ctinc. ctinc could residualize an expression that refers to inc, but it can’t use it to produce the output.

modules are not expressible with local lexical binding

This brings us to the heart of the issue: with procedural macros, modules impose a phasing discipline on the expansion process. Definitions from any given module must be available both at expand-time and at run-time. In our example, ctinc needs inc at expand-time, which is an early part of the compiler that is unrelated to any later partial evaluation by the optimizer. We can’t make inc available at expand-time just using let / letrec bindings.

This is an annoying result! What do other languages do? Well, mostly they aren’t programmable, in the sense that they don’t have macros. There are some ways to get programmability using e.g. eval in JavaScript, but these systems are not very amenable to “offline” analysis of the kind needed by an ahead-of-time compiler.

For those declarative languages with macros, Scheme included, I understand the state of the art is to expand module-by-module and then stitch together the results of expansion later, using a kind of link-time optimization. You visit a module’s definitions twice: once to evaluate them while expanding, resulting in live definitions that can be used by further syntax expanders, and once to residualize an abstract syntax tree, which will eventually be spliced into the compilation unit.

Note that in general the expansion-time and the residual definitions don’t need to be the same, and indeed during cross-compilation they are often different. If you are compiling with Guile as host and Hoot as target, you might implement cons one way in Guile and another way in Hoot, choosing between them with cond-expand.

lexical scope regained?

What is to be done? Glad you asked, Vladimir. But, I don’t really know. The compiler wants a big blob of letrec, but the expander wants a pearl-string of modules. Perhaps we try to satisfy them both? The library-group paper suggests that modules should be expanded one by one, then stitched into a letrec by AST transformations. It’s not that lexical scope is incompatible with modules and whole-program compilation; the problems arise when you add in macros. So by expanding first, in units of modules, we reduce high-level Scheme to a lower-level language without syntax transformers, but still on the level of letrec.

I was unreasonably pleased by the effectiveness of the “just splat in a prelude” approach, and I will miss it. I even pled for a kind of stop-gap fat-fingered solution to sloppily parse module forms and keep on splatting things together, but colleagues helpfully talked me away from the edge. So good-bye, sloppy: I repent my ways and will make amends, with 40 hail-maries and an alpha renaming thrice daily and more often if in moral distress. Further bulletins as events warrant. Until then, happy scheming!

by Andy Wingo at January 05, 2024 08:43 PM

v8's precise field-logging remembered set

A remembered set is used by a garbage collector to identify graph edges between partitioned sub-spaces of a heap. The canonical example is in generational collection, where you allocate new objects in newspace, and eventually promote survivor objects to oldspace. If most objects die young, we can focus GC effort on newspace, to avoid traversing all of oldspace all the time.

Collecting a subspace instead of the whole heap is sound if and only if we can identify all live objects in the subspace. We start with some set of roots that point into the subspace from outside, and then traverse all links in those objects, but only to other objects within the subspace.

The roots are, like, global variables, and the stack, and registers; and in the case of a partial collection in which we identify live objects only within newspace, also any link into newspace from other spaces (oldspace, in our case). This set of inbound links is a remembered set.

There are a few strategies for maintaining a remembered set. Generally speaking, you start by implementing a write barrier that intercepts all stores in a program. Instead of:

obj[slot] := val;

You might abstract this away:

write_slot(obj, sizeof obj, &obj[slot], val);

As you can see, it’s quite an annoying transformation to do by hand; typically you will want some sort of language-level abstraction that lets you keep the more natural syntax. C++ can do this pretty well, or if you are implementing a compiler, you just add this logic to the code generator.

Then the actual write barrier... well its implementation is twingled up with implementation of the remembered set. The simplest variant is a card-marking scheme, whereby the heap is divided into equal-sized power-of-two-sized cards, and each card has a bit. If the heap is also divided into blocks (say, 2 MB in size), then you might divide those blocks into 256-byte cards, yielding 8192 cards per block. A barrier might look like this:

void write_slot(ObjRef obj, size_t size,
                SlotAddr slot, ObjRef val) {
  obj[slot] := val; // Start with the store.

  uintptr_t block_size = 1<<21;
  uintptr_t card_size = 1<<8;
  uintptr_t cards_per_block = block_size / card_size;

  uintptr_t obj_addr = obj;
  uintptr_t card_idx = (obj_addr / card_size) % cards_per_block;

  // Assume remset allocated at block start.
  void *block_start = obj_addr & ~(block_size-1);
  uint32_t *cards = block_start;

  // Set the bit.
  cards[card_idx / 32] |= 1 << (card_idx % 32);
}

Then when marking the new generation, you visit all cards, and for all marked cards, trace all outbound links in all live objects that begin on the card.

Card-marking is simple to implement and simple to statically allocate as part of the heap. Finding marked cards takes time proportional to the size of the heap, but you hope that the constant factors and SIMD minimize this cost. However iterating over objects within a card can be costly. You hope that there are few old-to-new links but what do you know?

In Whippet I have been struggling a bit with sticky-mark-bit generational marking, in which new and old objects are not spatially partitioned. Sometimes generational collection is a win, but in benchmarking I find that often it isn’t, and I think Whippet’s card-marking barrier is at fault: it is simply too imprecise. Consider firstly that our write barrier applies to stores to slots in all objects, not just those in oldspace; a store to a new object will mark a card, but that card may contain old objects which would then be re-scanned. Or consider a store to an old object in a more dense part of oldspace; scanning the card may incur more work than needed. It could also be that Whippet is being too aggressive at re-using blocks for new allocations, where it should be limiting itself to blocks that are very sparsely populated with old objects.

what v8 does

There is a tradeoff in write barriers between the overhead imposed on stores, the size of the remembered set, and the precision of the remembered set. Card-marking is relatively low-overhead and usually small as a fraction of the heap, but not very precise. It would be better if a remembered set recorded objects, not cards. And it would be even better if it recorded slots in objects, not just objects.

V8 takes this latter strategy: it has per-block remembered sets which record slots containing “interesting” links. All of the above words were to get here, to take a brief look at its remembered set.

The main operation is RememberedSet::Insert. It takes the MemoryChunk (a block, in our language from above) and the address of a slot in the block. Each block has a remembered set; in fact, six remembered sets for some reason. The remembered set itself is a SlotSet, whose interesting operations come from BasicSlotSet.

The structure of a slot set is a bitvector partitioned into equal-sized, possibly-empty buckets. There is one bit per slot in the block, so in the limit the size overhead for the remembered set may be 3% (1/32, assuming compressed pointers). Currently each bucket is 1024 bits (128 bytes), plus the 4 bytes for the bucket pointer itself.

Inserting into the slot set will first allocate a bucket (using C++ new) if needed, then load the “cell” (32-bit integer) containing the slot. There is a template parameter declaring whether this is an atomic or normal load. Finally, if the slot bit in the cell is not yet set, V8 will set the bit, possibly using atomic compare-and-swap.

In the language of Blackburn’s Design and analysis of field-logging write barriers, I believe this is a field-logging barrier, rather than the bit-stealing slot barrier described by Yang et al in the 2012 Barriers Reconsidered, Friendlier Still!. Unlike Blackburn’s field-logging barrier, however, this remembered set is implemented completely on the side: there is no in-object remembered bit, nor remembered bits for the fields.

On the one hand, V8’s remembered sets are precise. There are some tradeoffs, though: they require off-managed-heap dynamic allocation for the buckets, and traversing the remembered sets takes time proportional to the whole heap size. And, should V8 ever switch its minor mark-sweep generational collector to use sticky mark bits, the lack of a spatial partition could lead to similar problems as I am seeing in Whippet. I will be interested to see what they come up with in this regard.

Well, that’s all for today. Happy hacking in the new year!

by Andy Wingo at January 05, 2024 09:44 AM

January 03, 2024

Brian Kardell

What's Good?

What's Good?

I think this is a question worth asking, let me explain why…

A few times a year, every developer advocate will ask developers about what developing features they're interested in or what pain points they experience most. That is a good thing. We should keep doing that.

While we don't often present it in this light, one thing this does is inform prioritization. The simple truth is that resources are way too finite, so we have to look for good signals about what should be prioritized.

I think it's similarly important to have feedback on the other end too: What's your satisfaction like on the web features you've gotten in the last few years? Can you name any that you use all the time? Can you name some that you over-estimated your need for? Something that you thought you needed/wanted but then ultimately didn't end up using so much? Something that you had high hopes for, but failed you?

But why ask that?

Well, it seems quite probable that there are things we can learn from that.

Maybe we can look at where we should have listened more, or pushed more. Maybe we can learn things about the processes that successful things took that unsuccessful things didn't (did they go through WICG? Were there polyfills? Origin trials? Did they stay in experimental builds behind a flag for a long time? Were they done at roughly the same time in all browsers?). Can we compare the amount of resources and time required between them? Maybe those things could also inform prioritization somehow?

Normally, around this time, I'd have wrapped up working on the latest Web Almanac and there would be lots of data flying at me which scratches a little bit of the kind of itch I've got - but this year we didn't do one, so I find myself wondering: How are people getting along with all of those things we've been delivering since 2019 or so?

So, let me know!

January 03, 2024 05:00 AM

January 02, 2024

Qiuyi Zhang (Joyee)

Fixing snapshot support of class fields in V8

Up until V8 10.0, the class field initializers had been broken in

January 02, 2024 09:20 PM

Building V8 on an M1 MacBook

I’ve recently got an M1 MacBook and played around with it a bit. It seems many open source projects still haven’t added MacOS with ARM64 into their support matrix, requiring a few extra steps to work properly, and V8 is no exception

January 02, 2024 09:20 PM

On deps/v8 in Node.js

I recently ran into a V8 test failure that only showed up in the V8 fork of Node.js but not in the upstream. Here I’ll write down my workflow used to debug the failure in case anyone (or myself) need to do this again and don’t know where to start

January 02, 2024 09:20 PM

Tips and Tricks for Node.js Core Development and Debugging

I thought about writing some guides on this topic in the nodejs/node repo, but it’s easier to throw whatever tricks I personally use on the Internet first - I am also going to heavily use the pronouns “I”, “We” and “You” in this post, and talk about my personal preference here, both of which we a

January 02, 2024 09:20 PM

Fixing Node.js vm APIs, part 4 - hitting the compilation cache again

In the last post I wrote about how I finally managed to fix the memory problems in the vm APIs, and it turned out that there was another issue blocking users from upgrading away from the End-of-Life Node

January 02, 2024 09:20 PM

Fixing Node.js vm APIs, part 3 - verifying the fixes

In the last post I wrote about how a new memory management model in Node.js vm APIs was designed to fix the long-standing leaks and use-after-free issues, and we finally put together a PR that the Node

January 02, 2024 09:20 PM

Fixing Node.js vm APIs, part 2 - reworking the memory management

In the last post, I wrote about how I came to work on a memory leak in the vm compilation APIs in Node

January 02, 2024 09:20 PM

Fixing Node.js vm APIs, part 1 - memory leaks and segmentation faults

This year I spent some time fixing a few long-standing issues in the Node.js vm APIs that had been blocking users from upgrading away from Node.js v16 (End-of-Life). It had been an interesting journey, so I decided to write a few blog posts about it. Hopefully they can be helpful for posterity

January 02, 2024 09:20 PM

Uncaught exceptions in Node.js

In this post, I’ll jot down some notes that I took when refactoring the uncaught exception handling routines in Node.js. Hopefully it could be useful for other people who are interested in this part of the code base, or for code archaeologists in the future

January 02, 2024 07:17 PM

New Blog

I’ve been thinking about starting a new blog for a while now. So here it is.

Not sure if I am going to write about tech here. Probably just going to rumble about my life.
(OK but these days my life is like 50% work, 30% work outside my job i.e

January 02, 2024 07:17 PM

Eric Meyer

Once Upon a Browser

Once upon a time, there was a movie called Once Upon a Forest.  I’ve never seen it.  In fact, the only reason I know it exists is because a few years after it was released, Joshua Davis created a site called Once Upon a Forest, which I was doing searches to find again.  The movie came up in my search results; the site, long dead, did not.  Instead, I found its original URL on Joshua’s Wikipedia page, and the Wayback Machine coughed up snapshots of it, such as this one.  You can also find static shots of it on Joshua’s personal web site, if you scroll far enough.

That site has long stayed with me, not so much for its artistic expression (which is pleasant enough) as for how the pieces were produced.  Joshua explained in a talk that he wrote code to create generative art, where it took visual elements and arranged them randomly, then waited for him to either save the result or hit a key to try again.  He created the elements that were used, and put constraints on how they might be arranged, but allowed randomness to determine the outcome.

That appealed to me deeply.  I eventually came to realize that the appeal was rooted in my love of the web, where we create content elements and visual styles and scripted behavior, and then we send our work into a medium that definitely has constraints, but something very much like the random component of generative art: viewport size, device capabilities, browser, and personal preference settings can combine in essentially infinite ways.  The user is the seed in the RNG of our work’s output.

Normally, we try very hard to minimize the variation our work can express.  Even when crossing from one experiential stratum to another  —  that is to say, when changing media breakpoints  —  we try to keep things visually consistent, orderly, and understandable.  That drive to be boring for the sake of user comprehension and convenience is often at war with our desire to be visually striking for the sake of expression and enticement.

There is a lot, and I mean a lot, of room for variability in web technologies.  We work very hard to tame it, to deny it, to shun it.  Too much, if you ask me.

About twelve and half years ago, I took a first stab at pushing back on that denial with a series posted to Flickr called “Spinning the Web”, where I used CSS rotation transforms to take consistent, orderly, understandable web sites and shake them up hard.  I enjoyed the process, and a number of people enjoyed the results.

google.com, late November 2023

In the past few months, I’ve come back to the concept for no truly clear reason and have been exploring new approaches and visual styles.  The first collection launched a few days ago: Spinning the Web 2023, a collection of 26 web sites remixed with a combination of CSS and JS.

I’m announcing them now in part because this month has been dubbed “Genuary”, a month for experimenting with generative art, with daily prompts to get people generating.  I don’t know if I’ll be following any of the prompts, but we’ll see.  And now I have a place to do it.

You see, back in 2011, I mentioned that my working title for the “Spinning the Web” series was “Once Upon a Browser”.  That title has never left me, so I’ve decided to claim it and created an umbrella site with that name.  At launch, it’s sporting a design that owes quite a bit to Once Upon a Forest  —  albeit with its own SVG-based generative background, one I plan to mess around with whenever the mood strikes.  New works will go up there from time to time, and I plan to migrate the 2011 efforts there as well.  For now, there are pointers to the Flickr albums for the old works.

I said this back in 2011, and I mean it just as much in 2023: I hope you enjoy these works even half as much as I enjoyed creating them.


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at January 02, 2024 04:22 PM

January 01, 2024

Alex Bradbury

Reflections on ten years of LLVM Weekly

Today, with Issue #522 I'm marking ten years of authoring LLVM Weekly, a newsletter summarising developments on projects under the LLVM umbrella (LLVM, Clang, MLIR, Flang, libcxx, compiler-rt, lld, LLDB, ...). Somehow I've managed to keep up an unbroken streak, publishing every single Monday since the first issue back on Jan 6th 2014 (the first Monday of 2014 - you can also see the format hasn't changed much!). With a milestone like that, now is the perfect moment to jot down some reflections on the newsletter and thoughts for the future.

Motivation and purpose

Way back when I started LLVM Weekly, I'd been working with LLVM for a few years as part of developing and supporting a downstream compiler for a novel research architecture. This was a very educational yet somewhat lonely experience, and I sought to more closely follow upstream LLVM development to keep better abreast of changes that might impact or help my work, to learn more about parts of the compiler I wasn't actively using, and also to feel more of a connection to the wider LLVM community given my compiler work was a solo effort. The calculus for kicking off an LLVM development newsletter was dead simple: I found value in tracking development anyway, the incremental effort to write up and share with others wasn't too great, and I felt quite sure others would benefit as well.

Looking back at my notes (I have a huge Markdown file with daily notes going back to 2011 - a file of this rough size and format is also a good stress test for text editors!) it seems I thought seriously about the idea of starting something up at the beginning of December 2013. I brainstormed the format, looked at other newsletters I might want to emulate, and went ahead and just did it starting in the new year. It really was as simple as that. I figured better to give it a try and stop it if it gets no traction rather than waste lots of time putting out feelers on level of interest and format. As a sidenote, I was delighted to see many of the newsletters I studied at the time are still going: This Week in Rust Perl Weekly (I'll admit this surprised me!), Ubuntu Weekly News, OCaml Weekly News, and Haskell Weekly News.

Readership and content

The basic format of LLVM Weekly is incredibly simple - highlight relevant news articles and blog posts, pick out some forum/mailing discussions (sometimes trying to summarise complex debates - but this is very challenging and time intensive), and highlight some noteworthy commits from across the project. More recently I've taken to advertising the scheduled online sync-ups and office hours for the week. Notably absent are any kind of ads or paid content. I respect that others have made successful businesses in this kind of space, but although I've always written LLVM Weekly on my own personal time I've never felt comfortable trying to monetise other people's attention or my relationship with the community in this way.

The target audience is really anyone with an interest in keeping track of LLVM development, though I don't tend to expand every acronym or give a from-basics explanation for every term, so some familiarity with the project is assumed if you want to understand every line. The newsletter is posted to LLVM's Discourse, to llvmweekly.org, and delivered direct to people's inboxes. I additionally post on Twitter and on Mastodon linking to each issue. I don't attempt to track open rates or have functioning analytics, so only have a rough idea of readership. There are ~3.5k active subscribers directly to the mailing list, ~7.5k Twitter followers, ~180 followers on Mastodon (introduced much more recently), and an unknown number of people reading via llvmweekly.org or RSS. I'm pretty confident that I'm not just shouting in the void at least.

There are some gaps or blind spots of course. I make no attempt to try to link to patches that are under-review, even though many might have interesting review discussions because it would simply be too much work to sort through them and if the discussion is particularly contentious or requires input from a wider cross-section of the LLVM community you'd expect an RFC to be posted anyway. Although I do try to highlight MLIR threads or commits, as it's not an area of LLVM I'm working right now I probably miss some things. Thankfully Javed Absar has taken up writing an MLIR newsletter that helps plug those gaps. I'm also not currently trawling through repos under the LLVM GitHub organisation other than the main llvm-project monorepo, though perhaps I should...

I've shied away from reposting job posts as the overhead is just too high. I found dealing with requests to re-advertise (and considering if this is useful to the community) or determining if ads are sufficiently LLVM related just wasnt a good use of time when there's a good alternative. People can check the job post category on LLVM discourse or search for LLVM on their favourite jobs site.

How it works

There are really two questions to be answered here: how I go about writing it each week, and what tools and services are used. In terms of writing:

  • I have a checklist I follow just to ensure nothing gets missed and help dive back in quickly if splitting work across multiple days.
  • tig --since=$LAST_WEEK_DATE $DIR to step through commits in the past week for each sub-project within the monorepo. Tig is a fantastic text interface for git, and I of course have an ugly script that I bind to a key that generates the [shorthash](github_link) style links I insert for each chosen commit.
  • I make a judgement call as to whether I think a commit might be of interest to others. This is bound to be somewhat flawed, but hopefully better than ramdom selection! I always really appreciate feedback if you think I missed something important, or tips on things you think I should include next week.
    • There's a cheat that practically guarantees a mention in LLVM Weekly without even needing to drop me a note though - write documentation! It's very rare I see a commit that adds more docs and fail to highlight it.
  • Similarly, I scan through LLVM Discourse posts over the past week and pick out discussions I think readers may be interested in. Most RFCs will be picked up as part of this. In some cases if there's a lengthy discussion I might attempt to summarise or point to key messages, but honestly this is rarer than I'd like as it can be incredibly time consuming. I try very hard to remain a neutral voice and no to insert personal views on technical discussions.
  • Many ask how long it takes to write, and the answer is of course that it varies. It's easy to spend a lot of time trying to figure out the importance of commits or discussions in parts of the compiler I don't work with much, or to better summarise content. The amount of activity can also vary a lot week to week (especially on Discourse). It's mostly in the 2.5-3.5h range (very rarely any more than 4 hours) to write, copyedit, and send.

There's not much to be said on the tooling said, except that I could probably benefit from refreshing my helper scripts. Mail sending is handled by Mailgun, who have changed ownership three times since I started. I handle double opt-in via a simple Python script on the server-side and mail sending costs me $3-5 a month. Otherwise, I generate the static HTML with some scripts that could do with a bit more love. The only other running costs are the domain name fees and a VPS that hosts some other things as well, so quite insignificant compared to the time commitment.

How you can help

I cannot emphasise enough that I'm not an expert on all parts of LLVM, and I'm also only human and can easily miss things. If you did something you think people may be interested in and I failed to cover it, I almost certainly didn't explicitly review it and deem it not worthy. Please do continue to help me out by dropping links and suggestions. Writing commit messages that make it clear if a change has wider impact also helps increase the chance I'll pick it up.

I noted above that it is particularly time consuming to summarise back and forth in lengthy RFC threads. Sometimes people step up and do this and I always try to link to it when this happens. The person who initiated a thread or proposal is best placed write such a summary, and it's also a useful tool to check that you interpreted people's suggestions/feedback correctly, but it can still be helpful if others provide a similar service.

Many people have fed back they find LLVM Weekly useful to stay on top of LLVM developments. This is gratifying, but also a pretty huge responsibility. If you have thoughts on things I could be doing differently to serve the community even better without a big difference in time commitment, I'm always keen to hear ideas and suggestions.

Miscellaneous thoughts

To state the obvious, ten years is kind of a long time. A lot has happened with me in that time - I've got married, we had a son, I co-founded and helped grow a company, and then moved on from that, kicked off the upstream RISC-V LLVM backend, and much more. One of the things I love working with compilers is that there's always new things to learn, and writing LLVM Weekly helps me learn at least a little more each week in areas outside of where I'm currently working. There's been a lot of changes in LLVM as well. Off the top off my head: there's been the move from SVN to Git, moving the Git repo to GitHub, moving from Phabricator to GitHub PRs, Bugzilla to GitHub issues, mailing lists to Discourse, relicensing to Apache 2.0 with LLVM exception, the wider adoption of office hours and area-specific sync-up calls, and more. I think even the LLVM Foundation was set up a little bit after LLVM Weekly started. It's comforting to see the llvm.org website design remains unchanged though!

It's also been a time period where I've become increasingly involved in LLVM. Upstream work - most notably initiating the RISC-V LLVM backend, organising an LLVM conference, many talks, serving on various program committees for LLVM conferences, etc etc. When I started I felt a lot like someone outside the community looking in and documenting what I saw. That was probably accurate too, given the majority of my work was downstream. While I don't feel like an LLVM "insider" (if such a thing exists?!), I certainly feel a lot more part of the community than I did way back then.

An obvious question is whether there are other ways of pulling together the newsletter that are worth pursuing. My experience with large language models so far has been that they haven't been very helpful in reducing the effort for the more time consuming aspects of producing LLVM Weekly, but perhaps that will change in the future. If I could be automated away then that's great - perhaps I'm misjudging how much of my editorial input is signal rather than just noise, but I don't think we're there yet for AI. More collaborative approaches to producing content would be another avenue to explore. For the current format, the risk is that the communication overhead and stress of seeing if various contributions actually materialise before the intended publication date is quite high. If I did want to spread the load or hand it over, then a rotating editorship would probably be most interesting to me. Even if multiple people contribute, each week a single would act as a backstop to make sure something goes out.

The unbroken streak of LLVM Weekly editions each Monday has become a bit totemic. It's certainly not always convenient having this fixed commitment, but it can also be nice to have this rhythm to the week. Even if it's a bad week, at least it's something in the bag that people seem to appreciate. Falling into bad habits and frequently missing weeks would be good for nobody, but I suspect that a schedule that allowed the odd break now and then would be just fine. Either way, I feel a sense of relief having hit the 10 year unbroken streak. I don't intend to start skipping weeks, but should life get in the way and the streak gets broken I'll feel rather more relaxed about it having hit that arbitrary milestone.

Looking forwards and thanks

So what's next? LLVM Weekly continues, much as before. I don't know of I'll still be writing it in another 10 years time, but I'm not planning to stop soon. If it ceases to be a good use of my time, ceases to have values for others, or I find there's a better way of generating similar impact then it would only be logical to move on. But for now, onwards and upwards.

Many thanks are due. Thank you to the people who make LLVM what it is - both technically and in terms of its community that I've learned so much from. Thank you to Igalia where I work for creating an environment where I'm privileged enough to be paid to contribute upstream to LLVM (get in touch if you have LLVM needs!). Thanks to my family for ongoing support and of course putting up with the times my LLVM Weekly commitment is inconvenient. Thank you to everyone who has been reading LLVM Weekly and especially those sending in feedback or tips or suggestions for future issues.

On a final note, if you've got this far you should make sure you are subscribed to LLVM Weekly and follow on Mastodon or on Twitter.

January 01, 2024 12:00 PM

December 26, 2023

Brian Kardell

Lovely Trees

Lovely Trees

You've read lots of Web Components posts lately, I think this one is a little different.

I'm thrilled that so many people are suddenly learning about, and falling in love with Custom Elements separately from Shadow DOM. It is, in my mind, best to learn about Custom Elements first anyways. And, if your goals are to use it in some simple, static sites, or blogs - then, well, that might be all you need. It's just a better version of what we used to do with jQuery, really.

But…

Here is where I want to say something about Shadow DOM, and I expect it will go something like this:

It's not so bad.

But, stick with me. It won't be that bad, I promise.

The simple light DOM way is, yes, good. But it creates a poor illusion if the component manipulates the DOM. That illusion is easy shattered as neither component authors, nor page authors can reason well about the tree in potentially important ways, because it is a transform of the page author's tree to some tree of the component author's making, with no designed coordination. And then, everything starts breaking down quickly, not just CSS. Script too, uses selectors and tree relationships. Where there is any kind of real complexity: It's cases like that that Shadow DOM should serve us very well for.

But it falls short

Despite this, it seems that for plenty of people who have tried, Shadow DOM is falling short. All of the posts I'm reading show that people are so put off by the current state that they'd flipped the bit until now on Custom Elements too. When one steps back and looks at all of the calls (from people who have been trying to use Shadow DOM for a while) for a few variations of "open style-able roots", or "slots in the light DOM", or "ability to use IDREFs across shadow roots" - or even the fact that we're bringing back scoped styles: It really starts to seem like maybe we've missed (or at least not recognized or prioritized) multiple important use cases along the way.

I think this is because we've focused mainly on giving developers a capability to build and share something that is pretty similar to native widgets - and that's not what most people think they need. Indeed, I think we've failed to wrestle with differences that seem sharp.

For example: Browsers are extremely careful to not expose their "Shadow DOM" internals because the consequences of doing so could be dire. If they didn't, then when browsers try to push an update that makes some otherwise innocuous, even welcome change, everything goes wrong. Users suddenly experience problems in tons of apps. Maybe they suddenly can't activate a control. Perhaps that prevents them from getting the information they need for their bank, or their insurance. It can be a very big deal. People start filing bugs on those websites and writing hate filled blog posts. Devs from those websites do the same in kind. And so on. No one wins.

However, code libraries (of custom elements or anything else) are different. It's sites themselves, not the browser or the library, that are in charge of deploying upgrades to libraries, which involves testing and avoids the worst surprises. Neither the site author, nor the library author, it seems, generally requires the kind of extreme upgrade guarantees that current Shadow DOM is built to grant. Largely, it seems they would provide other trade-offs instead.

The design of Shadow DOM also hasn't focused enough on collaboration. I believe (as I have since the beginning) that most uses of Shadow DOM are about some kind of collaboration -- more about preventing friendly fire. But what we've created is perhaps more like a programming language with only private — no protected or "friendly" concepts.

If you think that all of this sounds kind of damning of standards, it's more complicated than it seems. There are no cow paths to pave here. But, what if there were? Because, at this point, it sure seems like there could be.

I'm not saying I'd like to build a summer home there, but the Shadow Trees are actually quite lovely.

Treading Some Cow Paths

Lots of coordination is totally possible, it simply requires jumping through hoops and isn't standard. The community can, probably should, spend some time proving out and living with a few different ideas. That would make standardizing one of them much easier (standards are at their best, in my take, when they are mostly writing down the slang that developed and was tested naturally, in the wild). Better still it taps into the creative power of the commons to get us functional solutions now, rather than making us wait forever for solutions that might not arrive for years - or even ever! This sort of approach is how we got things like .querySelector() .matches() and .closest().

Today, if you create a Shadow DOM, style rules inside don't leak out to the rest of the page, and don't "leak in" from the page. There are a lot of people who dislike that second part. Tricky thing is, they don't all seem to dislike it the same way, or want the same kind of solution(s). What we need here, I think, is practical experience and, luckily, we have the raw materials to try solutions to some of this in the wild ourselves and see what pains it soothes (and probably, also realize some that it causes).

For example, here are few major potential philosophies:

Let components decide
Authors extend a new base class which then automatically pulls down a copy of some, or all of the styles provided by the page.
Let page authors decide
Lets the page say "these are the base, simple styles for all components" regardless of what they extend. I think this is kind of key because one of the really nice things about custom elements is that many of us might like to share and find and mix and match, which is pretty hard to do while also basing a solution on extending a particular base class.

But which one is "right"? All of them feel more natural for some use cases/scenarios. All of them are probably just terrible for others. Maybe there are more variants! Maybe what we need is a "pick one, that's how your page will work" idea. Or, maybe we need all of them to work! I think we can only learn through use and experimentation, so...

Here's a tiny library to let you try each those things!

And a little glitch you can poke around, inspect, remix, play with, and tweak.

Go on... Pick one. Try it. Remix the glitch, make a pen, try it on your site. Love it or hate it. Let it inspire better ideas. But, most importantly - share your thoughts - regardless! Did it do good things for you? Was it tricky? I want to know!

December 26, 2023 05:00 AM

December 24, 2023

Alex Bradbury

Let the (terminal) bells ring out

I just wanted to take a few minutes to argue that the venerable terminal bell is a helpful and perhaps overlooked tool for anyone who does a lot of their work out of a terminal window. First, an important clarification. Bells ringing, chiming, or (as is appropriate for the season) jingling all sounds very noisy - but although you can configure your terminal emulator to emit a sound for the terminal bell, I'm actually advocating for configuring a non-intrusive but persistent visual notification.

BEL

Our goal is to generate a visual indicator on demand (e.g. when a long-running task has finished) and to do so with minimal fuss. This should work over ssh and without worrying about forwarding connections to some notification daemon. The ASCII BEL control character (alternatively written as BELL by those willing to spend characters extravagantly) meets these requirements. You'll just need co-operation from your terminal emulator and window manager to convert the bell to an appropriate notification.

BEL is 7 in ASCII, but can be printed using \a in printf (including the /usr/bin/printf you likely use from your shell, defined in POSIX). There's even a Rosetta Code page on ringing the terminal bell from various languages. Personally, I like to define a shell alias such as:

alias bell="printf '\aBELL!\n'"

Printing some text alongside the bell is helpful for confirming the bell was triggered as expected even after it was dismissed. Then, if kicking off a long operation like an LLVM compile and test use something like:

cmake --build . && ./bin/llvm-lit -s test; bell

The ; ensures the bell is produced regardless of the exit code of the previous commands. All being well, this sets the urgent hint on the X11 window used by your terminal, and your window manager produces a subtle but persistent visual indicator that is dismissed after you next give focus to the source of the bell. Here's how it looks for me in DWM:

Screenshot of DWM showing a notification from abell

The above example shows 9 workspaces (some of them named), where the llvm workspace has been highlighted because a bell was produced there. You'll also spot that I have a timers workspace, which I tend to use for miscellaneous timers. e.g. a reminder before a meeting is due to start, or when I'm planning to switch a task. I have a small tool for this I might share in a future post.

A limitation versus triggering freedesktop.org Desktop Notifications is that there's no payload / associated message. For me this isn't a big deal, such messages are distracting, and it's easy enough to see the full context when switching workspaces. It's possible it's a problem for your preferred workflow of course.

You could put \a in your terminal prompt ($PS1), meaning a bell is triggered after every command finishes. For me this would lead to too many notifications for commands I didn't want to carefully monitor the output for, but your mileage may vary. After publishing this article, my Igalia colleague Adrian Perez pointed me to a slight variant on this that he uses: in Zsh $TTYIDLE makes it easy to configure behaviour based on the duration of a command and he configures zsh so a bell is produced for commands that take longer than 30 seconds to complete.

Terminal emulator support

Unfortunately, setting the urgent hint upon a bell is not supported by gnome-terminal, with a 15 year-old issue left unresolved. It is however supported by the otherwise very similar xfce4-terminal (just enable the visual bell in preferences), and I switched solely due to this issue.

From what I can tell, this is the status of visual bell support via setting the X11 urgent hint:

  • xfce4-terminal: Supported. In Preferences -> Advanced ensure "Visual bell" is ticked.
  • xterm: Set XTerm.vt100.bellIsUrgent: true in your .Xresources file.
  • rxvt-unicode (urxvt): Set URxvt.urgentOnBell: true in your .Xresources file.
  • alacritty: Supported. Works out of the box with no additional configuration needed.
  • gnome-terminal: Not supported.
  • konsole: As far as I can tell it isn't supported. Creating a new profile and setting the "Terminal bell mode" to "Visual Bell" doesn't seem to result in the urgent hint being set.

Article changelog
  • 2023-12-24: Add note about configuring a bell for commands taking longer than a certain threshold duration in Zsh.
  • 2023-12-24: Initial publication date.

December 24, 2023 12:00 PM

December 22, 2023

Ricardo García

Vulkan extensions Igalia helped ship in 2023

Last year I wrote a recap of the Vulkan extensions Igalia helped ship in 2022, and in this post I’ll do the exact same for 2023.

Igalia Logo next to the Vulkan Logo

For context and quoting the previous recap:

The ongoing collaboration between Valve and Igalia lets me and some of my colleagues work on improving the open-source Vulkan and OpenGL Conformance Test Suite. This work is essential to ship quality Vulkan drivers and, from the Khronos side, to improve the Vulkan standard further by, among other things, adding new functionality through API extensions. When creating a new extension, apart from reaching consensus among vendors about the scope and shape of the new APIs, CTS tests are developed in order to check the specification text is clear and vendors provide a uniform implementation of the basic functionality, corner cases and, sometimes, interactions with other extensions.

In addition to our CTS work, many times we review the Vulkan specification text from those extensions we develop tests for. We also do the same for other extensions and changes, and we also submit fixes and improvements of our own.

So, without further ado, this is the list of extensions we helped ship in 2023.

VK_EXT_attachment_feedback_loop_dynamic_state

This extension builds on last year’s VK_EXT_attachment_feedback_loop_layout, which is used by DXVK 2.0+ to more efficiently support D3D9 games that read from active render targets. The new extension shipped this year adds support for setting attachment feedback loops dynamically on command buffers. As all extensions that add more dynamic state, the goal here is to reduce the number of pipeline objects applications need to create, which makes using the API more flexible. It was created by our beloved super good coder and Valve contractor Mike Blumenkrantz. We reviewed the spec and are listed as contributors, and we wrote dynamic variants of the existing CTS tests.

VK_EXT_depth_bias_control

A new extension proposed by Joshua Ashton that also helps with layering D3D9 on top of Vulkan. The original problem is quite specific. In D3D9 and other APIs, applications can specify what is called a “depth bias” for geometry using an offset that is to be added directly as an exact value to the original depth of each fragment. In Vulkan, however, the depth bias is expressed as a factor of “r”, where “r” is a number that depends on the depth buffer format and, furthermore, may not have a specific fixed value. Implementations can use different values of “r” in an acceptable range. The mechanism provided by Vulkan without this extension is useful to apply small offsets and solve some problems, but it’s not useful to apply large offsets and/or emulate D3D9 by applying a fixed-value bias. The new extension solves these problems by giving apps the chance to control depth bias in a precise way. We reviewed the spec and are listed as contributors, and wrote CTS tests for this extension to help ship it.

VK_EXT_dynamic_rendering_unused_attachments

This extension was proposed by Piers Daniell from NVIDIA to lift some restrictions in the original VK_KHR_dynamic_rendering extension, which is used in Vulkan to avoid having to create render passes and framebuffer objects. Dynamic rendering is very interesting because it makes the API much easier to use and, in many cases and specially in desktop platforms, it can be shipped without any associated performance loss. The new extension relaxes some restrictions that made pipelines more tightly coupled with render pass instances. Again, the goal here is to be able to reuse the same pipeline object with multiple render pass instances and remove some combinatorial explosions that may occur in some apps. We reviewed the spec and are listed as contributors, and wrote CTS tests for the new extension.

VK_EXT_image_sliced_view_of_3d

Shipped at the beginning of the year by Mike Blumenkrantz, the extension again helps emulating other APIs on top of Vulkan. Specifically, the extension allows creating 3D views of 3D images such that the views contain a subset of the slices in the image, using a Z offset and range, in the same way D3D12 allows. We reviewed the spec, we’re listed as contributors, and we wrote CTS tests for it.

VK_EXT_pipeline_library_group_handles

This one comes from Valve contractor Hans-Kristian Arntzen, who is mostly known for working on Proton projects like VKD3D-Proton. The extension is related to ray tracing and adds more flexibility when creating ray tracing pipelines. Ray tracing pipelines can hold thousands of different shaders and are sometimes built incrementally by combining so-called pipeline libraries that contain subsets of those shaders. However, to properly use those pipelines we need to create a structure called a shader binding table, which is full of shader group handles that have to be retrieved from pipelines. Prior to this extension, shader group handles from pipeline libraries had to be requeried once the final pipeline is linked, as they were not guaranteed to be constant throughout the whole process. With this extension, an implementation can tell apps they will not modify shader group handles in subsequent link steps, which makes it easier for apps to build shader binding tables. More importantly, this also more closely matches functionality in DXR 1.1, making it easier to emulate DirectX Raytracing on top of Vulkan raytracing. We reviewed the spec, we’re listed as contributors and we wrote CTS tests for it.

VK_EXT_shader_object

Shader objects is probably the most notorious extension shipped this year, and we contributed small bits to it. This extension makes every piece of state dynamic and removes the need to use pipelines. It’s always used in combination with dynamic rendering, which also removes render passes and framebuffers as explained above. This results in great flexibility from the application point of view. The extension was created by Daniel Story from Nintendo, and its vast set of CTS tests was created by Žiga Markuš but we added our grain of sand by reviewing the spec and proposing some changes (which is why we’re listed as contributors), as well as fixing some shader object tests and providing some improvements here and there once they had been merged. A good part of this work was done in coordination with Mesa developers which were working on implementing this extension for different drivers.

VK_KHR_video_encode_h264 and VK_KHR_video_encode_h265

Fresh out of the oven, these Vulkan Video extensions allow leveraging the hardware to efficiently encode H.264 and H.265 streams. This year we’ve been doing a ton of work related to Vulkan Video in drivers, libraries like GStreamer and CTS/spec, including the two extensions mentioned above. Although not listed as contributors to the spec in those two Vulkan extensions, our work played a major role in advancing the state of Vulkan Video and getting them shipped.

Epilogue

That’s it for this year! I’m looking forward to help ship more extension work the next one and trying to add my part in making Vulkan drivers on Linux (and other platforms!) more stable and feature rich. My Vulkan Video colleagues at Igalia have already started work on future Vulkan Video extensions for AV1 and VP9. Hopefully some of that work is ratified next year. Fingers crossed!

December 22, 2023 02:50 PM

December 21, 2023

Eric Meyer

Pixelating Live with SVG

For reasons I’m not going to get into here, I want be able to pixelate web pages, or even parts of web pages, entirely from the client side.  I’m using ViolentMonkey to inject scripts into pages, since it lets me easily open the ViolentMonkey browser-toolbar menu and toggle scripts on or off at will.

I’m aware I could take raster screenshots of pages and then manipulate them in an image editor.  I don’t want to do that, though  —  I want to pixelate live.  For reasons.

So far as I’m aware, my only option here is to apply SVG filters by way of CSS.  The problem I’m running into is that I can’t figure out how to construct an SVG filter that will exactly:

  • Divide the element into cells; for example, a grid of 4×4 cells
  • Find the average color of the pixels in each cell
  • Flood-fill each cell with the average color of its pixels

As a way of understanding the intended result, see the following screenshot of Wikipedia’s home page, and then the corresponding pixelated version, which I generated using the Pixelate filter in Acorn.

Wikipedia in the raw, and blockified.

See how the text is rendered out?  That’s key here.

I found a couple of SVG pixelators in a StackOverflow post, but what they both appear to do is sample pixels at regularly-spaced intervals, then dilate them.  This works pretty okay for things like photographs, but it falls down hard when it comes to text, or even images of diagrams.  Text is almost entirely vanished, as shown here.

The text was there a minute ago, I swear it.

I tried Gaussian blurring at the beginning of my filters in an attempt to overcome this, but that mostly washed the colors out, and didn’t make the text more obviously text, so it was a net loss.  I messed around with dilation radii, and there was no joy there.  I did find some interesting effects along the way, but none of them were what I was after.

I’ve been reading through various tutorials and MDN pages about SVG filters, and I’m unable to figure this out.  Though I may be wrong, I feel like the color-averaging step is the sticking point here, since it seems like <feTile> and <feFlood> should be able to handle the first and last steps.  I’ve wondered if there’s a way to get a convolve matrix to do the color-averaging part, but I have no idea  —  I never learned matrix math, and later-life attempts to figure it out have only gotten me as far as grasping the most general of principles.  I’ve also tried to work out if a displacement map could be of help here, but so far as I can tell, no.  But maybe I just don’t understand them well enough to tell?

It also occurred to me, as I was prepared to publish this, that maybe a solution would be to use some kind of operation (a matrix, maybe?) to downsize the image and then use another operation to upsize it to the original size.  So to pixelfy a 1200×1200 image into 10×10 blocks, smoothly downsize it to 120×120 and then nearest-neighbor it back up to 1200×1200.  That feels like it would make sense as a technique, but once again, even if it does make sense I can’t figure out how to do it.  I searched for terms like image scale transform matrix but I either didn’t get good results, or didn’t understand them when I did.  Probably the latter, if we’re being honest.

So, if you have any ideas for how to make this work, I’m all ears  —  either here in the comments, on your own site, or as forks of the Codepen I set up for exactly that purpose.  My thanks for any help!


Have something to say to all that? You can add a comment to the post, or email Eric directly.

by Eric Meyer at December 21, 2023 03:35 PM