Introduction
This page was based on discussions starting here: http://mail.gnome.org/archives/desktop-devel-list/2005-January/msg00124.html
In the current GNOME desktop any applications that need to perform a privileged operation must be invoked as root. For example, in order for procman to renice applications it must be invoked using a utility like gnomesu. This has several disadvantages: it is tied to an implementation detail of the system authentication which we may want to change at any time; it requires relaunching an application that was originally launched unprivileged in order to perform privileged operations, which is disruptive to the user; the entire application including the frontend must run as root, which means things like user themes don't work very well, and GTK+/libX11 are not necessarily secure pieces of code. (The developers of those libraries have explicitly stated such.)
A new framework should be developed that allows applications to perform privileged operations without the liabilities of simply wrapping su-like behavior. The ideal situation would utilize separate frontend and backend processes. The frontend would run unprivileged and provide the GUI and as much of the application behavior as possible. A backend process could optionally be launched when necessary with privileges. When this backend is launched the user should be challenged for authentication if the system adminitration/builder has decided that it is necessary for the operation.
How and when this backend is launched is something that needs to be decided. Since the backend might require authentication it is best to only launch it when it is required. For example, procman doesn't need superuser privileges for most of the tasks it performs so asking the user to authenticate when procman starts wouldn't be very sensible. For the tasks procman does need privileges for, such as renicing applications, it should launch the backend when the user wishes to perform those tasks.
The UI model for when and how to authenticate must be decided upon. One possible method is the Mac OS X "lock icon" approach. In this approach, an icon of a lock is provided in the application's UI. When the user clicks on the icon the backend process is launched and (if the backend/configuration requires it) the user is authenticated with a password dialog or some other challenge. Another possible method would be to simply launch the backend whenever a menu option/action is initiated that requires it.
GNOME should provide an API/library to allow application authors to easily build applications with unprivileged frontends and privileged backends. The details of starting, authenticating, and communicating with a backend can be tricky and the authentication in particular needs to happen outside of the frontend application; no normal user application should ever ask for the user's password and especially never ask for a root password.
API Requirements
- Provide means for an application to launch and communicate with a privileged backend
- Do not require the application to know anything about whether authentication is required or how it is done
- Make it possible for individual backends to authenticate using different privileges
- Allow a friendly user interaction model to be built over the mechanism provided
Implementation Ideas
D-BUS springs to mind pretty quickly when thinking about communication between a frontend and backend process. D-BUS also offers service activation features which could possibly solve the problem of invoking the backend.
Throughout the rest of the document I will refer to an "authentication helper." This is the mechanism by which the privileged backends are given their privileges and the user is challenged for any required authentication. While written in the contex that the authentication helper is a setuid binary akin to su or consolehelper please note that D-BUS itself or some D-BUS service could be the authentication helper. I am trying to avoid the low-level details and focus on the higher-level API and user interaction at this point in time.
The operations the API needs are, roughly:
- Invoke backend (the details of authenticating the user are hidden from the application)
- Check if backend is running (perhaps hold a connection object to a backend from the invoke operation)
- Communicate with backend (D-BUS messages)
- Shutdown backend (close connection - necessary for long-running apps that don't need to hold on to the privileged process for long)
While these could potentially all be defined as D-BUS operations a simple API would go a long way in assisting application developers to easily make use of the system. Namely, we could provide functions for starting and stopping a given backend. If D-BUS is not up to the authentication interaction needs we have then the library could wrap an external helper command that can authenticate the user and then invoke the backend. The advantage of having a separate helper utility is that the utility can be swapped with any implementation the system administrator/builder wants. Red Hat can use consolehelper, Ubuntu can use sudo, Home Linux can just popup a confirmation dialog, and so on. No modifications to application source or recompilation of binaries would be necessary to modify or upgrade the authentication logic.
For communicating with the backend, simply exposing the raw D-BUS API is probably the correct choice. It might be useful to provide some very light and simple wrappers for the common case (i.e., send simple messages with no arguments or a list of strings as arguments). The simpler it is the use the better and the less error prone it will be.
The actual logic for launching backends should have several requirements placed on it. In the case that a backend like consolehelper is used all access control is already handled pretty well by consolehelper. For OSes that have nothing better than su-like behavior it would be beneficial to offer some very basic access control. For example, only backends that the utility knows about could be launched using the utility. Trojans could trick the user into running time-config-backend but not rm -fr /. Highly secure systems could combine this with the removal of normal su/xsu/gnomesu and other security enhancements to greatly reduce the risk (although it won't eliminate it!). More advanced access control is possible, although it should be left as an implementation detail of the individual authentication mechanisms as more advanced access control will usually require modifications to other parts of the system in order to be truly effective.
Assuming this is implemented largely using D-BUS the specific methods and services should be precisely documented in order to allow non-GNOME applications/frameworks to make use of backends designed for this system. The same goes for any necessary authentication helper utilities - the command line values, stdout/stderr expected behavior, expected return codes, and expected stdin content should all be precisely documented in order to facilitate creation of compatible replacement implementations and to allow other API wrappers to reliably use the utility.
User Interface
Aside from the technical mechanism of handling privileged operations and the authentication policy the GNOME desktop project must also work on a prefered user interface for performing privileged operations.
Ideally the system will not spam the user with authentication challenges. Many operations require special privileges in order to work although from most users' and administrators' stand-points should not require any special privileges. One example is configuring the system time. There are certain systems that will horrendously break or suffer from security vulnerabilities when the time changes; these systems are atypical and, while their needs must be supported, the default system behavior is not required to reflect those needs. The default system behavior could allow normal users to alter system time and administrators of systems in which such an operation is dangerous could simply disable the feature.
For systems that are intended for streamlined ("easy" ?) usage there very well may never be an authentication challenege. Such systems would be common in home-user scenarios, for example. Multi-user systems are likely to require authentication challenges for certain high-risk operations, however. It is imperative that a user interface guideline be specified for how systems should interact with the user when interaction is necessary.
There are two places in which the user interface must be specified for this proposal. The first is the frontend applications that utilize the backend. These applications must decide when to launch the backend. Because launching the backend can trigger user interaction the applications must only launch the backends at appropriate times.
For applications whose entire function is based on a privileged operation, such as a log viewer, it makes sense to launch the backend at application startup and leave it running until the application exits. If the backend cannot be launched (due to an authentication failure, for example) then the application should automatically exit. Having the application remain open with no backend serves little to no purpose in this scenario.
Most applications will not fall into this category. Applications will generally function mostly using unprivileged operations and offer/use only several privileged operations. One example would be procman and its renice functionality. Another example would be a software package manager that allows users to view what is installed or upgrade/remove/install packages; viewing them doesn't need privileges while modifying the package set does. Yet another example would be a hardware management application that allowed users to view installed hardware and to change parameters on certain devices. In all of these cases a user can load, use, and exit the application normally without ever needing the privileged backend. Starting the backend and possibly incurring the authentication challenge on the user is wasteful and adds unnecessary clutter to the user's task.
There are two ways in which the privileged backend can be started: the first is at explicit user request and the second is automatically on-demand. In the first case an icon similar to the Mac OS X lock icon would be provided in the applications' windows. If the user attempts an operation that requires the backend the user will be asked to authenticate. Clicking the lock icon would launch the backend and then possibly challenge the user for authentication. In the second case the backend would be launched when the user attempted the operation; if an authentication challenge is required it will then be displayed.
The second option would be far simpler both on the developer and on the user. In the first case the user is required to deal with the act of initiating authentication. This seems wasteful when the application is quite capable of doing it automatically. Additionally it requires the user to think in terms of authentication and adds additional noise to the user model. Finally, in cases where no authentication challenge is necessary, requiring the user to click the lock icon before performing an action is absolutely unnecessary and quite confusing to even many technical users. Developers would also have a more difficult time using the lock icon as they must then code the logic for handling the lock icon in addition to the logic for asking users to authenticate when performing an operation that requires the backend.
By simply launching the backend when the application needs it the authentication will still be carried out (if it is necessary) and the entire task and user interaction model is greatly simplified.
The second area that the user interface needs to be addressed is in the authentication challenge itself. There are generally three kinds of challenges. The first requires that the user enter a secret of some sort, such as a passphrase. The second requires that the user perform some other activity using a periphial device, such as a biometric device. In the third case the system requires that the user do some external task and then confirm that the task is complete. Note that the third case also covers the scenario where the system simply wants user confirmation.
All of these should be carried out using HIG-compliant dialogs. The passphrase entry case would simply ask for a particular passphrase (adminstrator passphrase, the user's own passphrase, or a passphrase tied to the specific operation being performed) and offer a [Cancel] button and a [Continue] button. When an external device is necessary and the machine automatically detects when it is successfully used the dialog need only inform the user which device to use and provide a [Cancel] button. In the last case the dialog would explain the action and provide a [Cancel] and [Confirm] button.
If authentication fails the dialog should reappear with an additional warning message. There should not be two dialogs with one detailing the error and then another for a second attempt. If authentication is cancelled the invoking application should not create any additional error or warning dialogs informing the user that they cancelled the operation - the user already knows that.
A final question is whether or not the backend should remain running once the operation is complete. In the procman example, once the user has reniced a process, should the privileged backend remain running until the user closes procman again? In one case the design may decide that once a user is authenticated they shouldn't do redundant work. In another case the security design may decide that even a 10 second window is enough for the user to make a "quick trip" away from her desk and for a malicious user to approach and abuse the elevated privileges. The framework must be able to accomodate both valid security designs.
One possible way to do this is to mandate that backends be used and then closed. Applications do not keep the backend running for any longer than they need. This moves the problem into the authentication code itself. In a system that uses the lax security design the authentication helper might not even offer any challenges at all so reinvoking the backend will not provide any additional work on the part of the user. A system that reauthenticates every chance it can get will get all the chances it wants to do so. In a system that uses a timestamp the authentication helper can simply use the timestamp to avoid re-challenging the user. Therefor, as per the requirements, the application is not dependent on the authentication scheme or security model. All of the details are nicely abstracted away to the authentication helper.
CameronHarris: Is there any way in which the user could somehow know that this is a real, trustworthy dialog asking for a password and not an evil program that's faked the dialog and is going to do bad things with the password? Windows has the Ctrl+Alt+Delete combination for this reason. Alternatively, there could be some special phrase that the user has to remember.