BuildStream file import rules

Summary

This is a description of the rules used by BuildStream (as of version 1.3, revision 8cea7b17a773230e37a84b1c0fccadb23f24a108) to import directories into other directories. This is the action performed by the virtual Directory class in the method import_files and also the functions copy_files and link_files in utils.py.

Intention

This is meant to be a specification, reverse engineered from the existing rules, which can be used to build new import code. For example, the new CasBasedDirectory code needs to import directory structures from other CAS-based directories, which have no traditional filesystem support.

Import procedure

Defintions: We are moving all files from the "source root" into the "destination root". This is also called an import.

1. Produce a list of all the files, symlinks and directories in the source directory, relative to the source root. Include directories only if they have no files in; directories containing only other directories are listed, but any regular files or symlinks mean it isn't included (the directory is implicitly listed as part of the files/symlinks in this case).

2. To produce a partial import, this file list may be reduced to a subset at this point; otherwise, we carry on as normal.

3. The list processor will now execute actions for each entry in the list. If a specific list of files is supplied to copy_files or link_files, the list is considered unordered and will be sorted alphabetically before proceeding. If no file list is provided, the list is just the output from list_relative_paths() called on the source directory and is not sorted.

The loop variable is simply called the 'entry' here. Each loop operation is called on one entry from the file list. An entry is a path relative to the source directory.

  • 3.1. Record keeping
    • 3.1.1. The array files_written is always updated with the entry name. At the end of the operation, files_written will be identical to the list of supplied files unless there is an exception during the copy. 3.1.2. If the destination exists (after following absolute symlinks) and does not match os.isdir (i.e. is not a directory or a symlink to a directory) then it will be added to the 'overwritten' list.
    3.2. Directory creation
    • 3.2.1 _copy_directories. The destination parent directory is determined using the dirname of the path appended to the *target* root. If this path exists in any form in the target, _copy_directories. does nothing.
      • If it does exist, we check the file in the source: If it's a directory, or a symbolic link, we proceed to make the same directory in the target.
      • If it's something else, we will throw a UtilError. Note that if anything exists with the same name in the destination, the type of the source isn't checked.

      3.2.2. _ensure_real_directory will attempt to check the destination path for symlinks, but only checks absolute symlinks. If it resolves to somewhere outside the destination root, an exception is thrown.
      • If it does resolve inside the destination, and there is nothing with the same name present, the directory is created. If something already exists there, it will be ignored and left there.
    3.3. Check source type
    • If the source file is found to be missing, we either raise a UtilError, or if ignore_missing_is set, just abort this entry and continue with the next one.

    • The remaining processing depends on the source type:
      • 3.3.1. Directory (matching S_ISDIR)
        • Use _ensure_real_directory again - this creates the actual directory, whereas the previous call just created all the necessary parent directories. After this, check the destination is really a directory (IS_DIR) and not a symbolic link. If it's anything other than a real directory, raise a UtilError.

        3.3.2. Symbolic link (matching S_ISLNK)
        • Use 'safe_remove' to remove anything existing at the target, then adjust the target using _relative_symlink_target, and create the link at the destination using os.symlink.
        • safe_remove will remove everything except a non-empty directory. If there's a non-empty directory, it returns false but otherwise doesn't complain. If there's anything else that it can't remove (for example due to a permission error) then it will also raise an exception. If nothing exists at the target, nothing happens.
        • If safe_remove encounters a non-empty directory, the entry gets added to the ignored file list and we move onto the next entry.
        3.3.3. Normal file (matching S_ISREG)
        • Same as a symbolic link, except we use 'actionfunc' to copy or link the file.
        3.3.4. Character/Block device (S_ISCHR or S_ISBLK)
        • Safe_remove as above and then uses mknod() to create it at the destination.
      • 3.3.5. Fifo (matching S_ISFIFO)
        • No removal attmepted. os.mkfifo called to create the destination.
        3.3.6. Socket (matching S_ISSOCK)
        • Silently ignored.
        3.3.7. Anything else

4. Set permissions.

  • In the above operations, _copy_directories returns a list of tuples - (directory_name, source_permissions) which we keep in a list. This list is appended when we get a directory entry (case 3.3.1). After all entries are processed, we call os.chmod to set the same permissions on all directories. Only real directories get chmod applied; nothing else is touched.

Potential problems

The use of direct symbolic link resolution appears to be incorrect. Any attempt to resolve an absolute symbolic link (that is, one starting with the path separator) would appear to be incorrect since the current code uses os.realpath and other functions which directly resolve symlinks such as os.path.exists. In some cases, even relative symlinks will resolve incorrectly; for example, ../../../../../../../usr is likely to resolve to the host's /usr if resolved directly, but to the destination root's 'usr' directory if resolved relative to that root.

There are other places in the code which account for this. _relative_symlink_target appears to do the job of resolving links relative to a base.

Ordering of file list

This is an example of three entries in which the import order affects the result:

/usr/sbin
/sbin (a symbolic link to /usr/sbin)
/sbin/hello (a file)

/usr/sbin
/sbin/hello (a file)
/sbin (a symbolic link to /usr/sbin)

Note that "/sbin/hello" cannot be generated by our current list_relative_paths functions (it would show up as /usr/sbin/hello in both cases.)

copy_files respects the order of files passed in as an argument if a list is passed in. If not, the order comes from list_relative_paths, and curiously, although list_relative_paths is careful to put its output in a specific order, copy_files will then sort the list alphabetically.

Ordering is especially important in content-addressable storage systems, since the protocol buffer based system used by BuildStream generates different hashes if the order of elements in a directory is reordered.

The ordering returned by list_relative_paths in utils.py is not deterministic. This function uses os.walk, which divides its output into 'directories' and 'files'. list_relative_paths processes directories before files. However, a symlink can end up in either category depending on what it points to. Symlinks to directories end up in 'directories'. Symlinks to files and broken symlinks end up in 'files'. Symlink resolution uses the host filing system, so a symlink pointing to /lib64 will end up in one of the two groups depending on whether the host's /lib64 exists or not.

Projects/BuildStream/ImportRules (last edited 2018-12-13 17:42:45 by JMacArthur)