Problem Reporting Architecture Proposal

Introduction

Here is a strawman proposal for how to design the problem reporting system.

Conceptual Components

  • System Logger
  • Problem Detector
  • Report Generator
  • User Problem Notifier
  • Reporting Mechanism
  • Problem Reporting and Review
  • Report Collection Server

System Logger

For the purposes of this design, the system logger collects and stores structured data for anomalous behavior of the system. It provides stable references to items within these data. Crash dumps, selinux/audit logs are included.

Problem Detector

The problem detector watches for specific types of logged information collected by the System Logger. If the problem is one of the types of trouble defined in Problem Reporting (namely Crash, Misbehavior, Misconfiguration, or Failure) then the Problem Detector will ask the Report Generator to create a new Pending Problem Report. The Detector should make a distinction between System Trouble and User Application trouble.

Report Generator

The Report Generator may gather supplementary details for a specific trouble condition and create a Pending Problem Report. This report should be stored in a non-volatile location as quickly as possible in order to capture the conditions close to the event.

User Problem Notifier

In Normal, Developer, or Managed modes the User Problem Notifier notifies the active user to:

  • Apologize for disruption
  • Attempt to restore the previous working state
  • (except for Managed mode) Request that the user consent to reporting the problem

In Unattended mode the user is not notified.

The user should be notified as close to the time of the Problem as possible. In the case of trouble during boot, login, or catastrophic failures during the user's session or anywhere a Pending Problem can not be displayed, the User Problem Notifier should present the notification on the next login.

Reporting Mechanism

Reporting Mechanism is primarily responsible for delivering the Pending Report to the Collection Server. It must be able to operate without additional information in Managed or Unattended modes.

It may, however, be configured to deliver reports to alternate locations including:

  • scp
  • ftp
  • email
  • NFS-mounted shared) storage

Problem Reporting and Review

In Normal or Developer modes the user may be asked to kindly submit the problem report in order to improve their system. In the user's mind the tool for this is the same as the application that can be used to review past Reports whether they were submitted or not. This tool is primarily an application with a Submit Report workflow.

The user experience goals include:

  • Helping the user to understand what happened
  • Apologizing to the user
  • Helping the user get back to what she was doing
  • Trying to make the user feel like she is in good hands
  • Appeal to the user's selfish desire to make their system better
  • Do not waste any more of the user's time than is necessary (they are here because we interrupted them)
  • Act in a way that allows them to trust us with their information
  • Explain the privacy implications of the report submission

Problem Report Collection Server

The Collection Server is where the Reporting Mechanism sends the report data. This server should: reporting server should:

  • Allow anonymous crash report submissions
  • Scrub sensitive user data from reports. Removing:
    • User's real name
    • usernames
    • email addresses
    • social security numbers
    • phone numbers
    • IP addresses
    • Document titles
    • user filenames (especially in $HOME)
    • URLs
  • Support filing reports in Bugzilla for developer review
  • Avoid duplicate report filing
  • Support linking crash reports to Bugzilla to allow Developer Mode direct access to bug reports
  • Perform coredump analysis and backtrace generation

Implementation Details

  • System Logger: systemd (git.freedesktop.org)
  • Problem Detector: problemd (git.freedesktop.org)
  • Report Generator: systemd/problemd?(git.freedesktop.org)
  • User Problem Notifier: gnome-settings-daemon (git.gnome.org)
  • Reporting Mechanism: ?
  • Problem Reporting and Review: Oops! application (git.gnome.org)
  • Report Collection Server: ?

System Logger

Need details of logging and cursors here.

Problem Detector

Need details of how we'll identify problem conditions.

Report Generator

What data do we need to include? Binary core? Logs? Metadata? Do we identify application vs system here or at detection time? How do we indicate whether the user has been notified about something yet?

User Problem Notifier

Need details of how gnome-settings-daemon will watch for events to notify about.

Reporting Mechanism

How do we send to the server? Do we need to use certificates to avoid man in the middle attacks?

Problem Reporting and Review

How do we handle third party applications? Firefox? Do they just opt out of the process and we just offer to restart the app?

Report Collection Server

We have to choose one, adapt one, or write one. :)

Discussion

Comments

Design/OS/ProblemReporting/Proposal (last edited 2013-12-04 19:16:51 by WilliamJonMcCann)