Orca Regression Testing

Main Ideas

The main ideas behind the Orca test harness are as follows:

  • Orca provides the ability to log a textual description of what it is sending to speech and braille. This allows for the recording of output, thus allowing for comparison to results in the future.
  • Orca provides the ability, via a DBus service, to allow an external process to tell it when to start logging output. This same DBus service also allows the external process to obtain the logged output.
  • The harness merely consists of playing back pre-recorded keystrokes and comparing Orca's output to a previously recorded successful run (where 'success' means Orca output the correct information).
  • The harness also has components for determining code coverage and profiling information.
  • The tests are currently based upon the en_US.UTF-8 locale. No translation of the tests has been done, nor is any translation planned.

Harness Directory Layout

The Orca regression tests contained in the test directory are laid out as follows:

  • ./harness: test harness scripts

  • ./keystrokes: contains all the tests

  • ./html: test data for Firefox tests as well as a Firefox profile to make sure Firefox runs in a consistent state

  • ./text: simple text data

Prerequisites

The first prerequisite is you: among other things, you need to have the knowledge, skills, and permission to build/install modules from GNOME CVS, to add users and run things as root. Throughout the rest of this page are various examples of how to do these things on Ubuntu and Solaris, but it is expected that you have the knowledge of what these commands are and how to run them. That is, this is not intended to be a guide on system administration and application development.

Macaroon

To run any of the tests, you need to build/install Macaroon. Macaroon can be obtained, built, and installed by issuing the following commands:

git clone git://git.gnome.org/accerciser
cd accerciser/macaroon
./autogen.sh
make
sudo make install

gtk-demo

The tests also require various applications to be installed, including gtk-demo. On Solaris, gtk-demo is available at /usr/demo/jds/bin/gtk-demo. To make things go smoother for Solaris, provide a symbolic link from /usr/bin/gtk-demo to /usr/demo/jds/bin/gtk-demo. For Ubuntu, you can obtain/install gtk-demo via the following command:

sudo apt-get install gtk2.0-examples

trace2html

To do code coverage analysis, you need to grab Olivier Grisel's trace2html 0.2.1 and apply the test/harness/trace2html-coverage-patch.txt. You can apply the patch and install trace2html via the following commands:

gunzip -c trace2html-0.2.1.tar.gz | tar xvf -
cd trace2html-0.2.1
<<<copy your trace2html-coverage-patch.txt to the current directory>>>
patch -p0 src/trace2html.py < trace2html-coverage-patch.txt
sudo python setup.py install

Python cProfile

To do performance profiling, you need the Python profiler module ("import cProfile"), which can be obtained via the following command on Ubuntu (WDW: need directions to get this on Solaris as of SXDE 01/08. I think I remember it seemed to be there by default at one time in the past.):

sudo apt-get install python-profiler

Main Files

  • ./harness/runall.sh: The ./harness directory contains two main scripts: runall.sh and runone.sh. The runall.sh script is the main script to use and will execute all the tests and will place the results in a directory whose name is of the form YY-MM-DD_HH:MM:SS (e.g., 2006-11-29_20:21:41). Run ./runall.sh --help to get help on the various parameters you can use. Running runall.sh with no command line parameters is the default mode. See 'Run the Harness' below for more information.

  • ./harness/runone.sh: The runone.sh script allows a test creator to run just one test. The parameters are as follows, where the <*.py test file> contains Macaroon-based keystrokes (see below), <app-name> is the name of the application to run with the given test file, and 0 or 1 are used to indicate whether Orca is currently running or not. See 'Run the Harness' below for more information.

  • ./harness/user-settings.py.in: contains the default Orca configuration settings. The primary thing this file does is disable the use of real speech and braille and instead sends a text form of the speech and braille to a log file. Note that there are also facilities to specify a custom user-settings file for each keystroke file so as to allow the testing of Orca settings (e.g., echo by word). See the writing tests page for more information.

  • ./keystrokes/*: The ./keystrokes directory consists of a set of directories, where each directory name has significant meaning and is used to determine which app the test harness uses for testing. The directory name is typically the name of a binary on the path. For example, there is a ./keystrokes/gtk-demo directory, and the files under this directory are for testing Orca using the gtk-demo application. See the writing tests page for more information. You may also see directories whose name matches the output of uname. These directories are used to contain platform specific tests (e.g., Ctrl+Esc for Solaris versus Alt+F1 for Linux). (WDW NOTE: the platform specific tests really haven't been fleshed out well and I'm not sure the harness really works well with them.)

Writing Tests Using Macaroon

See the writing tests page.

Running the Regression Tests

Set up an 'orca' Test Account

It is best to run regression tests from a different user account than the account you normally log into. This will help avoid conflicts with things such as personal preferences for theming as well as using 'point to focus' in a window versus the default 'click to focus'. The preferred username is orca, and this user should only use the default GNOME desktop settings. The main things of importance are:

  • Make sure you log in as the orca user at least once and enable GNOME Accessibility

  • Use metacity instead of compiz (assuming the distribution enables compiz by default).

  • Download and install the latest Firefox 4 nightly build.
  • Download and install the latest OpenOffice nightly build. Make sure you run this at least once to get through the registration screens.

  • You also apparently need a JRE for the OpenOffice tests to work (see bug 521651)

  • (WDW: more to go here)

Run the Harness

TIP: Either use xterm or manually set your gnome-terminal profile so that the title is not changed.

The harness is designed to be run from the test/harness directory. Don't run it from anywhere else or bad things might happen. To run the harness, merely run the runall.sh script when sitting in the test/harness directory:

./runall.sh > runall.out 2>&1

To specify running tests from just one application, you can add the absolute path to the directory using the -a parameter to runall.sh. For example:

./runall.sh -a `pwd`/../keystrokes/oowriter > runall.out 2>&1

If you want to specify a different PATH, you can do so quite easily. This makes testing different versions of an application easier. For example:

PATH=~/Desktop/firefox:$PATH ./runall.sh `pwd`/../keystrokes/firefox > runall.out 2>&1

The runall.sh script will run through all the keystrokes and output summary information for the tests to the console. So, redirecting the output to runall.out (as shown above) is a useful way to be able to save the output for later examination. As part of a run, you might see output such as the following:

Test 1 of 1 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/debug_commands.py:Report script information
EXPECTED:
     "BRAILLE LINE:  'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.20.0''",
     "     VISIBLE:  'SCRIPT INFO: Script name='gtk-de', cursor=0",
     "SPEECH OUTPUT: 'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.20.0''",
ACTUAL:
     "BRAILLE LINE:  'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.21.5''",
     "     VISIBLE:  'SCRIPT INFO: Script name='gtk-de', cursor=0",
     "SPEECH OUTPUT: 'SCRIPT INFO: Script name='gtk-demo (module=orca.default)'
Application name='gtk-demo' Toolkit name='GAIL' Version='1.21.5''",
[FAILURE WAS UNEXPECTED]

Unexpected failures are not good. When you get one of these, you should compare the output from the 'EXPECTED' section to the output of the 'ACTUAL' section and then work to resolve the differences.

You might also see output with KNOWN_ISSUE in it:

Test 3 of 5 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/role_radio_button.py:Range radio button
EXPECTED:
     "KNOWN ISSUE - the radio button should be presented as selected.",
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y Range RadioButton'",
     "     VISIBLE:  '& y Range RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'Range not selected radio button'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y Range RadioButton'",
     "     VISIBLE:  '& y Range RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'Range not selected radio button'",
[FAILURE WAS EXPECTED - LOOK FOR KNOWN ISSUE IN EXPECTED RESULTS]
Test 5 of 5 FAILED: /export/home/orca/orca/trunk-3743/test/harness/../keystrokes
/gtk-demo/role_radio_button.py:All radio button
EXPECTED:
     "KNOWN ISSUE - the radio button should be presented as selected.",
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y All RadioButton'",
     "     VISIBLE:  '& y All RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'All not selected radio button'",
ACTUAL:
     "BRAILLE LINE:  'gtk-demo Application Print Dialog TabList General Page Pri
nt Pages Filler & y All RadioButton'",
     "     VISIBLE:  '& y All RadioButton', cursor=1",
     "SPEECH OUTPUT: ''",
     "SPEECH OUTPUT: 'All not selected radio button'",
[FAILURE WAS EXPECTED - LOOK FOR KNOWN ISSUE IN EXPECTED RESULTS]

The presence of KNOWN_ISSUE in the expected results is a reminder of an issue that the team is aware of, but cannot fix.

Finally, after each test file is run, you should see summary output similar to the following:

SUMMARY: 4 SUCCEEDED and 0 FAILED (0 UNEXPECTED) of 4 for /export/home/orca/orca
/trunk-3743/test/harness/../keystrokes/gtk-demo/role_radio_menu_item.py

A quick way to analyze a saved runall.out file is via this command:

egrep "SUMMARY|FAILED" runall.out | grep -v "0 FAILED"

If you observe unexpected failures as part of a run, you can examine the debug logs in more detail. The runall.sh script saves the results to a directory whose name is of the form YYYY-MM-DD_HH:MM:SS (e.g., 2006-11-29_20:21:41). The YYYY-MM-DD_HH:MM:SS directory should contain a set of directories that matches those in the ./keystrokes directory. Under each of the those directories are files containing the reference speech and braille output from a run of the associated *.py file. For each test, there are 5 files: *.speech.unfiltered, *.speech, *.braille.unfiltered, *.braille, and *.debug. The debug files represent Orca debug output obtained during the run and are likely to always be different between runs of the harness. These are useful, however, for analyzing regression differences if they occur. The *.unfiltered files represent the exact output of orca whereas the other files represent a filtered form that helps with repeatability of test results.

Running Just One Test

As you are creating tests or debugging a particular problem, it is useful to be able to run just one test. You can use the runone.sh script for this:

./runone.sh <*.py test file> <app-name> [0|1]

With this command:

  • <*.py test file> is a specific *.py file to run

  • app-name is the name of the name of the application you want to run

  • [0|1] means whether or not Orca is currently running

Here's an example:

./runone.sh ../keystrokes/gtk-demo/role_radio_button.py gtk-demo 0

Running Code Coverage Analysis

Remember that you need to have Olivier Grisel's trace2html 0.2.1 with the trace2html-coverage-patch.txt patch applied as described above.

Code coverage analysis is then obtained by running runall.sh with the -c parameter:

./runall.sh -c

The coverage results will be placed in ../coverage/<YYYY-MM-DD_HH:MM:SS>.

Running Performance Analysis

Performance analysis is obtained by running runall.sh with the -p parameter:

./runall.sh -p

Remember that you need to have the Python profile module installed (sudo apt-get install python-profiler). The performance results will be placed in ../profile/<YYYY-MM-DD_HH:MM:SS>. The *.orcaprof file is a raw data profile file. The *.txt is a processed version of the *.prof file that is sorted by cumulative time spent in each method.

Doing a Performance Analysis Manually

You might just want to do a quick check or test by running Orca manually, experimenting with an app or feature, and then analyze the performance of that. You can do that by running test/harness/runprofiler.py to run Orca with profiling enabled. Do your manual experimentation here and then quit Orca. The raw binary profile data will be saved in a file called "orcaprof". You can analyze the data using commands such as the following:

python -c "import pstats; pstats.Stats('orcaprof').sort_stats('cumulative').print_stats()"

Nightly Tests

WDW has been experimenting with nightly tests on OpenSolaris 2008.11 (get it here and install it using the accessible install instructions. Here's what he did:

  1. Created an 'orca' test user and set it up
  2. AS THE 'orca' USER: Ran vncserver -ac :1 to setup vnc - a vnc session will be started by the nightly test to give the test user an X Server to use, and can be run on a headless system and/or a machine where nobody is logged into the console. This also creates an xstartup you will edit in the following steps.

  3. Set up the 'orca' user's vnc server's xstartup file (see below)

  4. Set up a nightly script to be run via a cron job

Set Up VNC

  1. AS THE 'orca' USER, first run vncserver -ac :1 if you haven't already done so.

  2. Then, run vncserver -kill :1 and give ~orca/.vnc/xstartup these contents:

[ -r $HOME/.Xresources ] && xrdb $HOME/.Xresources
vncconfig -iconic &
gnome-session &

Create Nightly Script

Here's the nightly script for OpenSolaris. It lives in ~orca/bin/orca_nightly_test, and it does the following:

  1. Pulls sources from SVN trunk, builds them, and installs them under /tmp -- WDW - this needs updating for GIT!
  2. Runs pylint on the code

  3. Runs the gtk-demo regression tests

  4. Sends mail only on failure in pylint or the regression tests

Note also that this script also determines the DBUS_SESSION_BUS_ADDRESS which is needed for the tests to communicate with Orca.

# NOTE: This assume TLS on SMTP server.
#
# $1 = SMTP server
# $2 = SMTP server username
# $3 = SMTP server password
# $4 = Your e-mail address
#

. ~/.bash_profile ""

# Get the VNC server going and also make sure we can connect
# to the D-Bus session bus.
#
vncserver -kill :1
bonobo-slay -s
vncserver -ac :1
sleep 30
eval `~/bin/get_dbus`
export DBUS_SESSION_BUS_ADDRESS

env 

# Now, check out from trunk, build it, and install it.
#
cd
rm -rf orca/trunk
git clone git://git.gnome.org/orca
SVNVERSION=`svnversion orca/trunk`
mv orca/trunk orca/trunk-$SVNVERSION
cd orca/trunk-$SVNVERSION
./autogen.sh --prefix=/tmp/orca-$SVNVERSION
make
make install
export PATH=/tmp/orca-$SVNVERSION/bin:$PATH

# Run pylint and make a summary of the bad results.
#
./run_pylint.sh src/orca/*.py src/orca/scripts/*.py 
grep "Your code has been" *.pylint | grep -v "10[.]00" > pylint_summary.out
echo "PYLINT RESULTS:"
cat pylint_summary.out

# Run the gtk-demo tests and make a summary of the bad results.
#
export DISPLAY=:1
xmodmap -e "keycode 23 = Tab ISO_Left_Tab"
xmodmap -e "keycode  79 = KP_Home KP_7 F27 KP_7 F27"
xmodmap -e "keycode  80 = KP_Up KP_8 F28 KP_8 F28"
xmodmap -e "keycode  81 = KP_Prior KP_9 F29 KP_9 F29"
xmodmap -e "keycode  83 = KP_Left KP_4 F30 KP_4 F30"
xmodmap -e "keycode  84 = KP_Begin KP_5 F31 KP_5 F31"
xmodmap -e "keycode  85 = KP_Right KP_6 F32 KP_6 F32"
xmodmap -e "keycode  87 = KP_End KP_1 F33 KP_1 F33"
xmodmap -e "keycode  88 = KP_Down KP_2 F34 KP_2 F34"
xmodmap -e "keycode  89 = KP_Next KP_3 F35 KP_3 F35"
xmodmap -pke

cd test/harness
./runall.sh -a `pwd`/../keystrokes/gtk-demo > gtk-demo.out 2>&1
egrep "SUMMARY" gtk-demo.out | grep -v "0 FAILED" > gtk-demo_summary.out
echo "GTK-DEMO RESULTS:"
cat gtk-demo_summary.out

export GTK_MODULES=
./runall.sh -a `pwd`/../keystrokes/firefox > firefox.out 2>&1
egrep "SUMMARY" firefox.out | grep -v "0 FAILED" > firefox_summary.out
echo "FIREFOX RESULTS:"
cat firefox_summary.out

# Put the pylint and regression test summaries together.
#
cd ../..
cat pylint_summary.out test/harness/gtk-demo_summary.out test/harness/firefox_summary.out > full_summary.out

# Send an e-mail only on failure.
#
NUMLINES=`cat full_summary.out | wc -l`
if [ $NUMLINES -ne 0 ]
then
INFO=`uname -a`
MACHINE=`hostname`
ME=`whoami`
SUBJECT="URGENT: orca-$SVNVERSION test failures on $INFO"
python $HOME/bin/mailit.py << EOF
$1
$2
$3
$ME@$MACHINE
$4
$SUBJECT
full_summary.out
EOF
fi

The get_dbus script looks like this:

MYID=`id -u`
GNOME_SESSION_PID=`ps -u $MYID -f | grep gnome-session$ | grep -v dbus | awk '{ print $2 }'`
pargs -e $GNOME_SESSION_PID | grep DBUS_SESSION_BUS_ADDRESS | awk '{ print $2 }'

The .bash_profile and the .bashrc file it calls look like this:

if [ -f ~/.bashrc ]; then
source ~/.bashrc
fi

. /opt/dtbld/bin/env.sh
export PATH=$PATH:/usr/X11/bin:/usr/openwin/bin:/usr/demo/jds/bin
export MANPATH=/usr/gnu/share/man:/usr/share/man:/usr/X11/share/man
export PAGER="/usr/bin/less -ins"
PS1='${LOGNAME}@$(/usr/bin/hostname):$(
    [[ "${LOGNAME}" == "root" ]] && printf "%s" "${PWD/${HOME}/~}# " ||
    printf "%s" "${PWD/${HOME}/~}\$ ")'

This also requires a ~orca/bin/mailit.py file, which sends mail via an SMTP server:

import smtplib

def prompt(prompt):
    return raw_input(prompt).strip()

smtpserver = prompt("SMTP Server: ")
username = prompt("Username: ")
password = prompt("Password: ")
fromaddr = prompt("From: ")
toaddrs = prompt("To: ").split()
subject = prompt("Subject: ")
filename = prompt("File: ")

msg = ("From: %s\r\nTo: %s\r\nSubject: %s\r\n\r\n"
       % (fromaddr, ", ".join(toaddrs), subject))

infile = open(filename, "r")
msg += infile.read()
infile.close()

server = smtplib.SMTP(smtpserver)
#server.set_debuglevel(1)
server.ehlo()
server.starttls()
server.login(username,password)
server.sendmail(fromaddr, toaddrs, msg)
server.quit()

Set Up the cron Job

Use crontab -e to set up your cron job. Here's an example.

0 0 * * * $HOME/bin/orca_nightly_test 'my.smtp.server' 'myusername' 'mypassword' 'email@address'
  • my.smtp.server - your ISP's SMTP server

  • myusername - the username you give to your SMTP server

  • mypassword - your password on your SMTP server

  • email@address - the e-mail address to get test failure notifications

Wish List for Nightly Tests

Ideally, we could set up the nightly tests to allow us to determine not only if Orca changes caused regressions in Orca, but also if external components caused regressions. For example, these tests should allow us to at least:

  • Run with Orca from trunk.
  • Run with the AT-SPI infrastructure (atk/gail/at-spi) from trunk.
  • Run with a Firefox nightly.
  • Run with a Dojo/Dijit nightly.
  • Run with an OOo nightly.
  • Run with a Thunderbird nightly (we need Thunderbird tests)

Known Issues

  • Solaris and Linux use different keycodes. Try to avoid embedding keycodes in the test files at all costs. To do so, use Macaroon's Level 2 recording as described on the writing tests page. Note also that OpenSolaris adds Up, Down, Home, End, etc., to the keypad keymap, screwing macaroon up. So, you might need to run these commands to remove those entries:

xmodmap -e "keycode  23 = Tab ISO_Left_Tab"
xmodmap -e "keycode  79 = KP_Home KP_7 F27 KP_7 F27"
xmodmap -e "keycode  80 = KP_Up KP_8 F28 KP_8 F28"
xmodmap -e "keycode  81 = KP_Prior KP_9 F29 KP_9 F29"
xmodmap -e "keycode  83 = KP_Left KP_4 F30 KP_4 F30"
xmodmap -e "keycode  84 = KP_Begin KP_5 F31 KP_5 F31"
xmodmap -e "keycode  85 = KP_Right KP_6 F32 KP_6 F32"
xmodmap -e "keycode  87 = KP_End KP_1 F33 KP_1 F33"
xmodmap -e "keycode  88 = KP_Down KP_2 F34 KP_2 F34"
xmodmap -e "keycode  89 = KP_Next KP_3 F35 KP_3 F35"
  • The OS-specific (e.g., 'uname') portions of the harness really do not work yet. Instead, the keystroke files are always played regardless of platform. We need to work this out.
  • You need to be somewhat careful about recording tests. There are cases where some keystrokes do not make it to the AT-SPI, such as when applications do keyboard grabs. In these cases, you may need to find a different way to accomplish what you're trying to do, or you need to hand edit the test files to add the missing keystrokes. In addition, be aware that some applications also embed time-based and username-specific stuff in them. They also sometimes modify their menus based upon past use (e.g., a list of recently used documents in the 'File' menu). In these cases, you should try to avoid navigating through these areas so as to avoid inconsistent output from run to run.
  • The test harness automatically starts and kills the application to be tested. As such, you usually do not need to record keystrokes to exit the application being tested, unless you happen to be writing a test for that, of course.
  • Some tests specify a particular font or fonts. If you are using OpenSolaris, please be sure SUNWgnome-fonts has been installed. Also please be sure that ttf-fonts-core (from the Extra repository) is NOT installed.

Note on StarOffice Tests

There are a few things you need to take into consideration with the StarOffice tests:

  • The tests expect to find swriter, scalc, etc., in your path. If you are using OpenOffice, you can just make symbolic links to point to the OpenOffice binaries.

  • The OpenOffice.org folks change the window titles just about each release. As a result, the code that looks for proper presentation of window titles is a bit sensitive. test/harness/utils.py has code in it to help adjust for these window title changes. With new releases, you may need to edit the getOOoName and getOOoBrailleLine methods in test/harness/utils.py to return a string that matches the window title changes the OpenOffice team made for the release.

Test Plan

NOTE: IGNORE THESE. They are here for organizational and historical reference only.

The Orca Test Plan outlines the tests that we want to have for Orca. Ideally, there is a 1:1 mapping between the written tests you find here and the automated tests in the regression test suite. The tests come in two primary forms: one is to test Orca's functionality with the AT-SPI implementation of the toolkit or application in question, the other is to test the script-specific work done for an application.

AT-SPI Implementation Tests

Application Specific Tests


The information on this page and the other Orca-related pages on this site are distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Projects/Orca/RegressionTesting (last edited 2013-12-28 19:26:49 by WilliamJonMcCann)