boost/filesystem/path.hpp

Introduction
Grammar for generic path strings
Canonical form
Header synopsis
Class path
Member functions
Non-member functions

Introduction

Many Filesystem Library functions traffic in objects of class path, provided by this header.  Non-member functions for error checking are also supplied.

For actual operations on files and directories, see boost/filesystem/operations.hpp documentation.

For file I/O stream operations, see boost/filesystem/fstream.hpp documentation.

As with all Filesystem Library components, errors may result in filesystem_error or std::bad_alloc exceptions being thrown. See Requirements.

Class path

Class path provides for portable mechanism for representing paths in C++ programs.  Class path is concerned with the lexical and syntactic aspects of a path, regardless of whether or not such a path currently exists in the operating system's filesystem.

Rationale: If filesystem functions trafficked in std::strings or C-style strings, the functions would provide only an illusion of portability since the function calls would be portable but the strings they operate on would not be portable.

Conceptual model of a path

An object of class path can be conceptualized as containing a sequence of strings, where each string contains the name of a directory, or, in the case of the string representing the element farthest from the root in the directory hierarchy, the name of a directory or file. Such a path representation is independent of any particular representation of the path as a single string.

There is no requirement that an implementation of class path actually contain a sequence of strings, but conceptualizing the contents that way provides a completely portable way to reason about paths.

So that programs can portably express paths as a single string, class path defines a grammar for a portable generic path string format, and supplies constructor and append operations taking such strings as arguments. Because user input or third-party library functions may supply path strings formatted according to operating system specific rules, an additional constructor is provided which takes a system-specific format as an argument.

Access functions are provided to retrieve the contents of a object of class path formatted as a portable path string, a directory path string using the operating system's format, and a file path string using the operating system's format.  Additional access functions retrieve specific portions of the contained path.

Grammar for portable generic path string

The grammar is specified in extended BNF, with terminal symbols in quotes:

path ::= [system-specific-root] [relative-path] 
relative-path ::= element { "/" element } 
element ::= name | parent-directory 
parent-directory ::= ".." 
name ::= char { char }

system-specific-root grammar is implementation-defined. system-specific-root must not be present in generic input (the undecorated path constructors); it may be part of the strings returned by path member functions, and may be present in the argument to path constructors with the system_specific decorator.

Although implementation-defined, it is desirable that system-specific-root have a grammar which is distinguishable from other grammar elements, and follow the conventions of the operating system.

Whether or not a generic path string is actually portable to a particular operating system will depend on the names used.  See the Portability Guide.

Canonical form

Adjacent name, parent-directory elements in m_name have been recursively removed.

Header boost/filesystem/path.hpp synopsis

namespace boost
{
  namespace filesystem
  {
    enum path_format { system_specific };

    class path
    {
    public:
      // compiler generates copy constructor,
      // copy assignment, and destructor

      // constructors:
      path();
      path( const std::string & src );
      path( const char * src );
      path( const std::string & src, path_format );
      path( const char * src, path_format );

      // append operations:
      path & operator<<=( const path & rhs );
      const path operator<<( const path & rhs ) const;

      // query functions: 
      bool is_null() const;
      const std::string & generic_path() const;
      const std::string & file_path() const;
      const std::string & directory_path() const;
      const std::string leaf() const;
      const path branch() const;

      // iteration:
      typedef implementation-defined iterator;
      const iterator begin() const;
      const iterator end() const;

    private:
      std::vector<std::string> m_name;  // for exposition only
    };

    const path operator<< ( const char * lhs, const path & rhs );
    const path operator<< ( const std::string & lhs, const path & rhs );

    // Also see Undocumented non-member functions below

  }
}

Rationale: The return type of several functions (operator<<, leaf, branch) is const path instead of path to disallow expressions like (p1<<p2) = p3.  See Scott Myers, Effective C++, Item 21. Likewise, begin() and end() return const iterator rather than iterator. This detects non-portable code such as ++pth.begin(), which will not work if iterator is a non-class type. See next() and prior() in boost/utility.hpp.

Member functions

For the sake of exposition, class path member functions are described as if the class contains a private member std::vector<std::string> m_name. Actual implementations may differ.

Rationale: Return types of query functions have been chosen to match the types needed by important uses, and to be efficient in common implementations.

Note: There is no guarantee that a path object represents a path which is considered valid by the current operating system. A path might be invalid to the operating system because it contains invalid names (too long, invalid characters, and so on), or because it is a partial path still as yet unfinished by the program. An invalid path will normally be detected at time of use, such as by one of the Filesystem Library's operations or fstream functions.

Portability Warning: There is no guarantee that a path object represents a path which would be portable to another operating system. A path might be non-portable because it contains names which the operating systems considers too long or contains invalid characters. Validity checking functions are supplied to ensure names in paths are as portable as desired, but they must be explicitly called by the user.

Naming Rationale: Class path member function names and operations.hpp non-member function names are chosen to be distinct from one another. Otherwise, given a path foo, for example, both foo.empty() and empty( foo ) would be valid, but with completely different semantics. Avoiding this was considered more important than consistency with some C++ Standard Library naming conventions, which aren't followed uniformly anyhow, even in the standard.

System-specific Representation

Several path non-member functions return representations of m_name in formats specific to the operating system. These formats are implementation defined. If an m_name element contains characters which are invalid under the operating system's rules, and there is an unambiguous translation between the invalid character and a valid character, the implementation is required to perform that translation. For example, if an operating system does not permit lowercase letters in file or directory names, these letters will be translated to uppercase if unambiguous. Such translation does not apply to generic path string format representations.

Representation example

The difference between the representations returned by generic path(), directory_path(), and file_path() are illustrated by the following code:

path my_path( "foo/bar/data.txt" );
std::cout << "generic_path---: " << my_path.generic_path() << '\n'
          << "directory_path-: " << my_path.directory_path() << '\n'
          << "file_path------: " << my_path.file_path() << '\n';

On POSIX or Windows, the output representations would be identical:

generic_path---: foo/bar/data.txt
directory_path-: foo/bar/data.txt
file_path------: foo/bar/data.txt

But on a hypothetical operating system using OpenVMS format representations, they would each be different:

generic_path---: foo/bar/data.txt
directory_path-: [foo.bar.data.txt]
file_path------: [foo.bar]data.txt

Note that that because this system uses period as both a directory separator character and as a separator between filename and extension, directory_path() in the example produces a useless result. On this operating system, the programmer should only use this path as a file path. (There is a portability recommendation to not use periods in directory names.)

constructors

path();

Effects: Default constructs an object of class path.

path( const std::string & src );
path( const char * src );

Precondition: src conforms to the generic path string grammar relative-path syntax, and contains no embedded '\0' characters.

Effects: For each src elementm_name.push_back( element ).

Postcondition: m_name has been reduced to canonical form.

Rationale: These constructors are not explicit because an intended use is automatic conversion of strings to paths.

path( const std::string & src, path_format );
path( const char * src, path_format );

Precondition: src conforms to the operating system's grammar for path strings, and contains no embedded '\0' characters.

Effects: For each src element (where an element represents a directory name, file name, or parent-directory indicator),  m_name.push_back( element ).

Postcondition: m_name has been reduced to canonical form.

operator <<=

path & operator<<=( const path & rhs );

Effects: Append rhs.m_name to m_name.

Returns: *this

Postcondition: m_name has been reduced to canonical form.

Rationale: It is not considered an error for rhs to include a system-specific-root because it might relative, and thus valid.  For example, on Windows, the follow must succeed:

path p( "c:", system_specific );
p <<= path( "/foo", system_specific );
assert( p.generic_path() == "c:/foo" );

operator <<

const path operator<< ( const path & rhs ) const;

Returns: path( *this ) <<= rhs

Rationale: Operator << is supplied, because it, together with operator <<=, provides a convenient way for users to supply paths with a variable number of elements.  For example, initial_directory() << "src" << test_name. Operator+, with operator+=, were considered as alternatives, but deemed too easy to confuse with those operators for std::string.

Note: Also see non-member operator<< functions.

is_null

bool is_null() const;

Returns: m_name.size() == 0

generic_path

const std::string & generic_path() const;

Returns: The contents of m_name, formatted according to the rules of the generic path string grammar.

Note: If any m_name elements originated from the system specific constructors, there is no guarantee that the returned string is unambiguous according to the grammar. A system-specific-root indistinguishable from a relative-path name, a name containing "/", a name "..", and a system-specific-root beyond the first element all could cause ambiguities. Such an ambiguous representation might still be useful for some purposes, such as display. If no m_name elements originated from the system specific constructors, the returned string is always unambiguous.

See: Representation example above.

file_path

const std::string & file_path() const;

Returns: The contents of m_name, formatted in the system-specific representation of a file path.

See: Representation example above.

Warning: This function is intended only for use in calls to operating system or third-party libraries. Use in other contexts is probably a programming error. The preferred way to obtain a std::string from a path is generic_path().

directory_path

const std::string & directory_path() const;

Returns: The contents of m_name, formatted in the system-specific representation of a directory path.

See: Representation example above.

Warning: This function is intended only for use in calls to operating system or third-party libraries. Use in other contexts is probably a programming error. The preferred way to obtain a std::string from a path is generic_path().

leaf

const std::string leaf() const;

Returns: is_null() ? std::string() : m_name.back()

Rationale: Return type is const string rather than const string & to give implementations freedom to avoid  maintaining the leaf as a separate string object.

branch

const path branch() const;

Returns: m_name.size() <= 1 ? path("") : x, where x is a path constructed from all the elements of m_name except the last.

iterator

typedef implementation-defined iterator;

An iterator meeting the C++ Standard Library requirements for bidirectional iterators (24.1). The value, reference, and pointer types are std::string, const std::string &, and const std::string *, respectively.

begin

const iterator begin() const;

Returns: m_path.begin()

end

const iterator end() const;

Returns: m_path.end()

Non-member functions

Non-member operator<<

const path operator << ( const char * lhs, const path & rhs );
const path operator << ( const std::string & lhs, const path & rhs );

Returns: path( lhs ) <<= rhs

Undocumented non-member functions

The header boost/filesystem/path.hpp also supplies several non-member functions which can be used to verify that a path meets certain requirements. These subsidiary functions are undocumented pending more research and discussion, and should not be relied upon as they are likely to change.


© Copyright Beman Dawes, 2002

Revised 20 September, 2002