GNOME Shell 4
Introduction
This page contains software architectural ideas for how a future GNOME Shell should work. It's referred to as GNOME Shell 4 as it would potentially not be backward compatible (with regards to extensions) with GNOME Shell 3.*.
Motivation
GNOME Shell 3 was designed to be an X11 compositing manager, meaning relied on X11 for a lot of heavy lifting, such as interacting with GPUs and input devices. With this in mind, it has not been crucial to allow low latency input handling and drawing, as when low latency and high performance throughput has mattered, the X server has been the one responsible. For example, input goes directly from the X server to the X clients, and when high performance is really important (e.g. fullscreen games), applications have had the ability to bypass GNOME Shell and rely entirely on the X server for passing along content from the client to the GPU. For visual feedback that also relies on low latency, the X server has also been completely responsible, namely pointer cursor movement.
It has also been possible to implement things related to input handling (such as text input including input methods) using X11 and existing implementations in things like GTK+.
With Wayland, this landscape has changed drastically. Now instead there is no X server between clients and GNOME Shell, and no X server between GNOME Shell and the GPU, meaning GNOME Shell itself must be able to provide low latency forwarding input from input devices and low latency forwarding output from clients to the GPU.
There is also the issue with certain features that in the past has relied on X11 that should not continue to do so, for example input methods.
Problem areas
To sum it up, there are a number of problem areas that needs new solutions.
- Low latency input forwarding
- Low latency visual input event feedback (pointer cursor movement)
Low latency & zero copy client content forwarding (scan-out of client buffer)
- Input methods in the shell UI
- Stalls on the main thread stalls compositor frame redraws
Potential solutions
Option A
A potential architecture for solving all of these problems is to split up what today is GNOME Shell into two separate processes:
- A UI process
- A compositor process
The UI process
The UI of GNOME Shell would have to be largely rewritten, but it would be done so using GTK+ instead of St. This has the extra benefit of not having multiple GUI toolkits in GNOME, as St would no longer be used anywhere. That would mean we would be able to benefit from all the progress made in GTK+ 4 for writing the shell UI. The shell UI would use the GDK Wayland backend with a special purpose private Wayland protocol for integrating with the GNOME Shell compositor process. It would use the Wayland backend both in the X session and the Wayland session, and the X server would not be involved with the shell UI in any way.
The UI process could also be implemented in a way that it can be restarted.
The compositor process
The compositor process would consist of the following different parts:
- A generic compositor library (libmutter)
- A GNOME Shell specific parts (compositor side of the private GNOME Shell Wayland protocol)
The compositor process side of GNOME Shell part be much smaller than what it is today, with the UI moved out. What is left is mostly positioning related logic, and related animations.
Libmutter would need to be adapted to support low latency, and doing so would be done by splitting up different parts into different threads. Namely:
- A KMS thread
- An input thread
- A Wayland thread
- A main thread (compositing / window management etc)
KMS thread
The KMS thread would have the ability to get updates from other units, and would on its own schedule commit KMS "transactions" (atomic transactions when supported, otherwise hw cursor movement + crtc flips). It'd have an API where users would request changes that should be applied on the next flip.
Input thread
The input thread would directly process input from libinput, and under normal circumstances have the ability to request hardware cursor updates directly to the KMS thread. It should also be able to forward input events directly to Wayland clients by directly talking to the Wayland thread.
Wayland thread
When possible, for example when no update that happens on the primary plane is scheduled and a clients buffer should be directly scan-out-able onto a CRTC, the Wayland thread should be able to directly request the KMS thread to update the new content.
Main (compositing) thread
A main thread would handle compositing of the primary plane, as well as handle window management and everything related to that. It is assumed that the compositing thread may occasionally stall due to various reasons, such as GPU synchronization. A major reason for splitting up various things into different threads is to simply be able bypass the compositing thread to avoid these stalls.
Extensions
All extensions would have to be rewritten, probably from scratch, as the architecture would change dramatically. For example, they'd have to be written against GTK+ instead of St, and they'd have to be done with the multi-process architecture in mind.
This, however, means we'd have the ability limit what extensions can do in the compositor process (for stability reasons), and reconsider whether monkey patching or well defined extension points is the way forward. It would probably be a good idea to be wary about introducing extensions in a garbage collected language in the compositor process however, but only allowing compiled compositor side extensions might be very problematic.
Implementation
Some things (such as the introduction multiple compositor side threads) can be done early without much external impact.
On the other hand, the parts specific to the shell UI and UX is hard to implement without breaking extensions. Either one move piece by piece of the UI out of the compositor process into a new UI process, but each piece moved would break extensions interacting with that particular piece. Thus, there are three options:
- Work completely on a separate branch until ready,
- "Move" piece by piece while keeping the existing piece intact and the new piece turned off by default
- Enter a period of constant breakage and move piece by piece while breaking more and more extensions every release
All three has both pros and cons. For example 1) would partially stop the development of new features until ready, 2) would mean a lot of extra work, and 3) would cause *a lot* of breakage to extension users.
Other things to consider
While implementing the shell UI as a Wayland client is entirely possible and would be just a implementation detail of GNOME Shell, it is not certain all platforms (more specifically certain BSDs) that can currently run GNOME Shell can use Wayland, even if it doesn't involve libinput and KMS.
Option B
Another option, while potentially being less drastic, but one that would only solve 1-3 (out of 5) of the listed problem areas, is to introduce a proxy display server. A proxy display server would be somewhat similar to an X server, as in, it would be the Wayland server clients talk to, and it would be the process interacting with KMS and libinput, but GNOME Shell would instead be compositing frames handing it over to the proxy display server instead of directly to KMS.
Some things to note:
- The proxy display server would need to implement all Wayland protocols and keep track of all state
- The proxy display server would be able to bypass GNOME Shell and directly update KMS (similar to the input/Wayland/KMS thread of Option A)
- GNOME Shell would also need to implement all Wayland protocols as well
- When GNOME Shell crashes, the proxy display server would recreate the state it has managed, meaning client connections would survive
Pros/cons
- + GNOME Shell would largely stay the same meaning much less rewriting
- + Extensions would continue to work just as is
- - We'd still have two GUI toolkits (St and GTK+)
- - Stalls in painting UI and managing other shell related things would still stall compositing of clients
- - It wouldn't solve reliance on X11 for input methods
- - Each Wayland protocol would need to be implemented "twice"
Comments
If the architecture for extensions is re-examined for 4 then libWSM might be worth a look, Its not being actively developed anymore but it shared goals with xdg-portals, the security story around extensions is something worth considering. Especially with the improvements for app development in recent years. --JohnMcHugh