This site has been retired. For up to date information, see handbook.gnome.org or gitlab.gnome.org.


[Home] [TitleIndex] [WordIndex

BuildStream file import rules

Summary

This is a description of the rules used by BuildStream (as of version 1.3, revision 8cea7b17a773230e37a84b1c0fccadb23f24a108) to import directories into other directories. This is the action performed by the virtual Directory class in the method import_files and also the functions copy_files and link_files in utils.py.

Intention

This is meant to be a specification, reverse engineered from the existing rules, which can be used to build new import code. For example, the new CasBasedDirectory code needs to import directory structures from other CAS-based directories, which have no traditional filesystem support.

Import procedure

Defintions: We are moving all files from the "source root" into the "destination root". This is also called an import.

1. Produce a list of all the files, symlinks and directories in the source directory, relative to the source root. Include directories only if they have no files in; directories containing only other directories are listed, but any regular files or symlinks mean it isn't included (the directory is implicitly listed as part of the files/symlinks in this case).

2. To produce a partial import, this file list may be reduced to a subset at this point; otherwise, we carry on as normal.

3. The list processor will now execute actions for each entry in the list. If a specific list of files is supplied to copy_files or link_files, the list is considered unordered and will be sorted alphabetically before proceeding. If no file list is provided, the list is just the output from list_relative_paths() called on the source directory and is not sorted.

The loop variable is simply called the 'entry' here. Each loop operation is called on one entry from the file list. An entry is a path relative to the source directory.

4. Set permissions.

Potential problems

The use of direct symbolic link resolution appears to be incorrect. Any attempt to resolve an absolute symbolic link (that is, one starting with the path separator) would appear to be incorrect since the current code uses os.realpath and other functions which directly resolve symlinks such as os.path.exists. In some cases, even relative symlinks will resolve incorrectly; for example, ../../../../../../../usr is likely to resolve to the host's /usr if resolved directly, but to the destination root's 'usr' directory if resolved relative to that root.

There are other places in the code which account for this. _relative_symlink_target appears to do the job of resolving links relative to a base.

Ordering of file list

This is an example of three entries in which the import order affects the result:

/usr/sbin
/sbin (a symbolic link to /usr/sbin)
/sbin/hello (a file)

/usr/sbin
/sbin/hello (a file)
/sbin (a symbolic link to /usr/sbin)

Note that "/sbin/hello" cannot be generated by our current list_relative_paths functions (it would show up as /usr/sbin/hello in both cases.)

copy_files respects the order of files passed in as an argument if a list is passed in. If not, the order comes from list_relative_paths, and curiously, although list_relative_paths is careful to put its output in a specific order, copy_files will then sort the list alphabetically.

Ordering is especially important in content-addressable storage systems, since the protocol buffer based system used by BuildStream generates different hashes if the order of elements in a directory is reordered.

The ordering returned by list_relative_paths in utils.py is not deterministic. This function uses os.walk, which divides its output into 'directories' and 'files'. list_relative_paths processes directories before files. However, a symlink can end up in either category depending on what it points to. Symlinks to directories end up in 'directories'. Symlinks to files and broken symlinks end up in 'files'. Symlink resolution uses the host filing system, so a symlink pointing to /lib64 will end up in one of the two groups depending on whether the host's /lib64 exists or not.


2024-10-23 11:36