GNOME Shell under Wayland

History

In 2011 NeilRoberts and RobertBragg adapted Mutter as a proof of concept to be able to run the Clutter/MX based Dawati Shell (a shell aimed at Netbooks) as a hybrid X and Wayland compositor. We presented this at Fosdem2012. As Intermediate steps we also created two simple standalone compositors before making any changes to Mutter:
1. We wrote a trivial Cogl based compositor (Cogland that let us test the Cogl integration we required but without any support for input or high-level shell functionality)
2. We wrote a trivial Clutter based compositor test-wayland-surface (Now called Clayland) that let us test the Clutter integration we required without the added complexity of Mutter itself.
In the summer of 2012 we built on that work further so that we could also show GnomeShell running as a hybrid Wayland compositor and we demonstrated that at Guadec 2012. Some of the patches for this work are attached to bug #671741. Up until this point we were based on the 0.85 Wayland protocol.
We are now looking at updating our proof of concept work to work with the latest 1.x Wayland protocols to a level of quality that allows distributions to start shipping the software. By 3.10 we'd like users to be able to optionally choose to run GnomeShell as a hybrid Wayland compositor and by 3.12 we'd like this to be the only mode of operation.

Development Plan

There are currently two main tracks involved in enabling Wayland support in GnomeShell:

Bootstrapping wayland client support: The first step here is to enable running GnomeShell as a nested X11 application with support for connecting Wayland clients so that toolkit development can switch asap to testing against GnomeShell instead of Weston. The priority here is enabling others to make progress in parallel with the GnomeShell development. This should help us discover the functionality gaps that may require custom wayland protocol extensions for example.
System integration: Solve the system integration issues that were ignored during the initial prototyping work. Most importantly this means that the GnomeShell shouldn't require root privileges and it should be possible to launch GnomeShell as a Wayland compositor from GDM. This also implies controlling the display hardware via KMS.

Bootstrapping wayland client support

Currently NeilRoberts is focusing on this track. The first step has been to update Cogland to the 1.x protocol (done) and now Clayland is also being updated. This code will then largely copy across directly to Mutter. The patches here: #671741 will form the basis for this update.

In particular the following patches will likely be folded into a single 'Add Initial Wayland Client Support' patch:

a0d9c2b configure: Adds --with-xwayland-path option
5795d5d wayland: Adds basic hybrid X + Wayland support
23aebbb wayland: Add basic mouse input support
3107368 wayland: Add keyboard support
926f378 wayland: Install a device manager
257ac97 wayland: Add support for translating Clutter scroll events

While updating this work the plan is to avoid the use of #ifdefs to conditionally enable hybrid Wayland support and instead aim to have runtime control over what mode Mutter runs in. This should allow us to avoid shipping multiple versions of Mutter and GnomeShell for 3.10.

Details on testing this patchset can be found in the /Testing page.

System integration

Launcher

The first aim here is to avoid needing to run the compositor as root and the plan is to use a similar mechanism to weston-launch to enable us to isolate the code needing privileges from the complexity of Mutter + GnomeShell.

So far I've used some of the weston-launch code as a starting point and have created a tiny libvts (Lib Virtual Terminal Session) api that I aim to use to write a launcher for GnomeShell. There are a few differences compared to the current weston-launch program:

The code is in library form so it should be easy to re-use for different launchers with slightly different requirements. This also lets us tightly couple the client side code that needs to communicate with the launcher. (Currently the weston project has some separate utility code that wraps up communication with the launcher but I think it could be nice to avoid defining a standard launcher protocol and instead just have a tiny, shared client/server api that can encapsulate the socket protocol)
The error handling aims to be a bit more thorough and also allows passing errors back up the call stack so we don't have to make assumptions about how to handle or report them.
Logs to the systemd journal
Opening a pam session is optional, since we will probably end up relying on gdm to open the session.
All tty handling is in the launcher so we don't expose tty devices opened as root to the compositors, and so they don't have to duplicate the fiddly code for handling vt switch signals. This should also offer more robust switching in the face of compositor bugs. For example while VT switching we can run a watchdog to detect that the compositor has hung and kill the compositor instead of locking up the whole machine. We can also assert that drmDropMaster is called before completing a VT switch.
The interface for opening input devices was tightened up a bit. It stats() the device before opening and accepts fewer open(2) flags, such as O_CREAT. The plan is to also cross reference with udev that the device belongs to the session-seat before opening.

Display handling

Cogl already has a libgbm + KMS based winsys which we used for our earlier prototyping but this will need further development.

One aspect of this work is that we plan to extend the CoglOutput api to allow Cogl applications to query more information about display outputs with a portable api. In addition to querying more information about outputs we may expose some display configuration and overlay support via this api too, but we will also maintain an escape hatch for applications (including mutter potentially) to directly access KMS if useful.

GDM session launching

GDM is the gnome login manager, which is responsible for authenticating users via the PAM api, opening a new PAM session once authenticated and launching a user specified session program. The PAM "conversation" used to authenticate a user is done by talking to a login screen that is implemented using GnomeShell running under X11.

Currently GDM assumes that the user's session program wants an X server to run and it actually re-purposes the X server that's running for the login screen for the user's session program. For 3.10 we are assuming that we will continue to use an X11 based login screen for simplicity.

Since we now want to be be able to launch programs that don't need an X server we need to make some tweaks to how GDM sets up PAM before opening a session. To start with we can introduce a property for /usr/share/xsessions/xyz.desktop files that declares that the session doesn't require X and avoiding setting PAM_XDISPLAY and PAM_XAUTHDATA data. This should ensure that when PAM issues a CreateSession dbus request to logind then the systemd session will have a type of "tty" instead of "x11". When GDM sees that new session we can then make sure it checks the type to determine if it should re-use the existing X server. Beyond that we may need to extend the PAM and CreateSession api to be able to associate wayland specific authentication information with the session.

RayStrode has written a good overview of the issues related to GDM here

Open Issues

How can we make sure that when we switch between compositors that other compositors can't snoop evdev input events. Currently the launcher has a request for opening an input device with root privileges and returns a file descriptor back to the compositor. The launcher doesn't have a way to explicitly revoke that file descriptor when switching VTs.
The current plan is to implement some form of "mute" ioctl that requires root privileges that the launcher can use to disable events being delivered to a compositor while it isn't active. This is similar to how the launcher already revokes KMS priviledges by calling drmDropMaster. RFC patch for evdev mute here
Can one still restart the shell without crashing all clients ? To do that, implement the reexec-and-inherit-state approach that systemd uses. Clients won't even notice the compositor changed.
- If you mean restarting the process which contains the Wayland compositor: not necessarily, and even if the core components let you do it, you'd need to find a way to make the EGL clients co-operate. Every time the clients allocate a buffer, the EGL stack (running in-process with the compositor) is the one which accepts the new buffer request, allocates a buffer wrapper structure, and associates a pointer to that structure with the client's buffer object ID. So you'd need to somehow save and restore all the state in the EGL stack too, which is really hard. If you don't actually restart the compositor and the process, but keep the compositor alive, then you're OK. -daniels
  - Actually.. maybe not that hard, if you have a way to serialize the compositor state (ie. information about attached client buffers, etc) over a socket. If you could open a socket between the outgoing (crashing) compositor and the new incoming compositor, send the serialized state across the socket, you can pass the dmabuf/prime fd for a buffer across that socket and on the incoming compositor side, re-import that into EGL. This nicely handles the refcounting on that buffer too so nothing gets lost when the outgoing compositor exits. -RobClark
- We can serialize the compositor state and recreate all protocol objects in the new compositor, but the server side EGL stack needs to do the same. Internally, EGL may have a lot of private protocol objects that the core compositor doesn't see and they all need to be recreated in the new servers EGL stack. It's possible and the most straightforward approach (though that's not saying a lot) is to create a socketpair in the old compositor, pass one end over the re-exec socket to the new compositor which will pass it to an EGL extension entry point (say eglImportStateWL) and in the old compositor, pass the other socketpair fd to eglDumpState. In general, any kind of wayland-server library implementing an interface needs to support similar functionality. As an example, wl_shm is implemented by wayland-server, and needs to be able to re-create all wl_shm, wl_shm_pool, and wl_buffer objects it created in the old compositor.
  - Oh, right, the core compositor doesn't see wl_drm, etc.. yeah, maybe there is some room for an EGL extension. Or perhaps just some helpers in wl that EGL can use to serialize it's state. (You somehow kinda need a socket involved in the serialization to be able to pass buffer fd's) -RobClark
- Alternatively, we can get the clients to help out. We can expose a new interface that lets clients create a, say, wl_restartable object. This object will emit an event when the compositor restarts, and the event provides the fd of the new connection to the client. The client then needs to recreate all it's wl_* objects. There old server still has to dump state (such as window position, which buffer is currently attached to which surface etc) to the new compositor. The problem with this approach is that we need a way to match up the new objects with compositor state from the old objects in the old server so that window positions and other state can be maintained.
What is the plan for supporting proprietary Nvidia/ATI drivers?
How will we support display configuration?
Wayland doesn't provide display configuration capabilities as part of the core protocol, it's up to each shell/compositor to define it's own mechanisms. The last time we discussed this the plan was to provide a Dbus service from GnomeShell that the Gnome Control Center display settings dialog would use. Using Dbus here means we'd be able share more code between X and Wayland.