$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Pierre-Andre Galmes (galmes_at_[hidden])
Date: 2004-12-04 12:14:27
Beman Dawes wrote:
> At 07:40 PM 10/3/2004, David Abrahams wrote:
>  >Beman Dawes <bdawes_at_[hidden]> writes:
>  >
>  >> At 10:20 AM 10/3/2004, David Abrahams wrote:
>  >>
>  >>  >I've said it before, but I always found the checking to be much more
>  >>  >of a hindrance than a help.
>  >>
>  >> So presumably you would be in favor of changing the default to
>  >> "no_check"?
>  >
>  >I think so.
> 
> Unless strong objections arise, I'll make the change to the main trunk 
> after the 1.32 branch for release. That will give us plenty of time to 
> work out any kinks before release to the general public via 1.33.
> 
> --Beman
Greetings,
I am new in Boost Filesytem and my involvement is related to one of my 
courses.  As a project, we have to  a request for change and we can then 
submit our "contribution" by posting it on the list. The first part of 
my work was to analyse a request. The second part is this post. The 
analysis can be found on the following link 
(http://perso.efrei.fr/~galmes/boost/name_check.html). It explains which 
is the problem in detail using examples.
The request is the following : The automatic name checking 
functionality, in the Boost library is turned on by defaults. That means 
the Boost::filesystem will check for "portable" paths. The question is 
then, should the "name check" functionality be turned on by default ? 
Which between portable_name, native and no_check would be the best 
choice ?  This mail is divided in three parts. The first one expose the 
conclusions drawn from the analysis. The second part is the 
argumentation of the results.
The last part is composed of some suggestions after having spend some 
hours going through the library. I apologize for the length of this post 
and hope that it will help to improve the boost::filesystem library.
I - The results
Here is what I found out from that analysis (which should not be big 
news :-):
- by default the "no_check" option seems to be the best.
- "native" would be the best once it supports multiples filesystem 
checks. It may then be a good idea if this kind of check is going to 
work in Boost 1.33 !
Suggestions to improve the "usability" of the library :
- the name "native" is not so explicit and tends to confuse users.
- How the "native" option works could by explained in an
   "easy-to-understand" way in the documentation.
I - Argumentation
The automatic name checking functionality can behave in many different 
ways, and from those, only three could be used as default values :
     * portable_name : check if the path is "portable" (default on   	
       boost-1.3.2).
     * native : check if the name is valid for the OS being used to
       execute the program.
     * no_check : does not perform any check.
We will then try to show which of those is the best. For this, I try to 
make an "objective" analysis trying to quantify the choices. This is a 
way to try to solve the problem but might not be the best one : it is a 
lot of explanations and may just get rid of most of the readers ;).
Here is a summary of the pros and cons for each choice from the previous 
posts.
a - portable_name (default on boost 1.32)
     * Pro 1 : Is the check that require the strictest names to ensure
     portability. Programs using this should not have any path
     portability problems for most common operating systems.
     * Cons 1 : enabling name checking by default prevents users doing
     programs with no name constraints. Those non-portable program
     might constitute the majority of programs written using
     Boost::filesystem.  (http://tinyurl.com/5mb5r)
     * Cons 2 : checking implies a performance hit (see
     http://tinyurl.com/6esmw)
a - no_check
     * Pro 1 : Does not put any constraints on the users with no need of
     portability which should represent the majority of users. (See
     http://tinyurl.com/5mb5r)
     * Pro 2 : The use of the option "no_check" is explicit (compared to
     "native" or "portable_name").
b - native :
     * Pro 1 : Less restrictive that "portable_name" but still realise
     some checks for the native operating system.
     * Cons 1 : the name "native" is confusing. Not explicit enough that
     native check it will check on the operating system, not on the file
     system (http://tinyurl.com/6esmw).
     * Cons 2 : Give an illusion of "security"/ "portability" on the same
     operating system. This is false, as the check is done on the
     operating system, not on the file system. Thus, on the same
     operating system the portability depends on the file system used. An
     example of this problem was given in a post by Beman (See
     http://tinyurl.com/635yn).
d - Different matters and their importance
Now that, here is a table summarizing which of those arguments are
important for the users wanting to use the library for doing portable 
programs and those important for the "common users" with no need for 
portability. This is my point of view, and I would be interested in 
knowing which is yours. The explanation about the different terms used 
are given below.
!			!   portable users	!  common users !
!---------------------------------------------------------------!
! explicit		!	 +		!	++ 	!
! portability check	!	 +++		!	-	!
! OS check		!	 -		!	+	!
! fs check		!	 -		!	++	!
! performance hit	!	 ++		!	++	!
! security illusion	!	 ++		!	+	!
! ease of use		!	 +++		!	+++	!
- explicit name :
The name of the option (native, no_check...) is self explanatory about 
the way the check is done.
common users (++) : this is important so that they are able to use the 
library without having to read in detail the documentation.
portable users (+) : They will have to read the documentation more in 
detail in order to know how to produce portable path. Hence, that the 
name is explicit is a less important criteria for choosing a default value.
- portability check :
Checks that path will be valid over the most popular platforms (POSIX,
Windows). This is the check done by "portable_name".
common users (-) : for common users that any check related to 
portability is of no importance as they won't port their programs.
portable users (+++) : this is one of the most important criteria.
- OS check :
Checks that for the current OS, paths are valid.
common users (+) : if paths are not accepted, the program will not work. 
I only put (+) because often users know which characters are valid for 
their OS.
portable users (-) : When writing portable programs, you do not really 
care that it is valid for your OS : it should be valid for all OS. This 
is covered by the portability check.
- fs check :
Checks the validity of the path for the file system the path tries to 
access. I suppose that this check should be done at runtime when trying 
to work with a particular file or directory.
common users (++) : if paths are not accepted, the program will not 
work. I put (++) because less users know which characters are valid for 
the different file systems they manipulate.
portable users (-) : When writing portable programs, you do not really 
care that it is valid for your fs : it should be valid for all fs. This 
is covered by the portability check.
- ease of use :
Is the library easy to use according to the users aims ? Does the user 
has to write many lines of code in order to achieve what he wants ?
all users (+++) : This is the most important feature. If a library does 
not provide nice interfaces for the users, that he has to reconfigure it 
all the time so that the checks succeed
- performance hit :
Does using a check implies a performance hit ?
all users (++) : This is also an important feature. When coding a 
program, users always want it to run fast. This is an important criteria 
but less important that the ease of use, especially if the performance 
hit is mild.
- security illusion :
Does the program gives the illusion that it will work correctly on 
different platforms ?
common user (+) : This is not a big deal, as portability is not the 
first problem when writing a program.
portable user (++) : If the program written in order to be portable just 
give an illusion of such a behavior, this would be a real problem.
e - Different criteria and their availability
Here is a table representing for each of the three options there 
behavior for the different criteria listed above. The characters '+', 
'=' and '-' represent the "points" given to the criteria if it is 
available for an option. The criteria ease-of use is separated for the 
two kind of users.
table of "appearance" :
!			! portable_name	!    native 	! no_check	
!-------------------------------------------------------------------!
! explicit name		!	 +	!	-	!	++  !
! portability check	!	 ++	!	+	!	-   !
! OS check		!	 ++	!	++	!	-   !
! fs check		!	 -	!	-	!	-   !
! no performance hit	!	 -	!	-	!	++  !
! no security ill.	!	 -	!	-	!	++  !
common users :
! ease of use		!	 -	!	++	!	++  !
Portable users :
! ease of use		!	 ++	!	-	!	-   !
f - The best option
We can now try to satisfy the most users by comparing the criteria in a
"mathematical" way. As a first though, we could suppose that the 
majority of the users would use boost::filesystem for non-portable 
programs. Let say 80% will write programs without any need for portability.
 From this we can then deduce which criteria will fit the best. For each
option, and for each kind of users, we calculate it in the following way :
option
  = explicit importance			* appearance
  + portability_check importance		* appearance
  + OS check importance			* appearance
  + fs check importance			* appearance
  + ease of use importance		* appearance
  + no performance hit importance	* appearance
  + no security ill. importance		* appearance
We use the table and choose the following :
+ = 1 point	
- = 0 points
Then, We find :
common users :
!			! portable_name !    native 	! no_check !
!------------------------------------------------------------------!
! explicit		!   2 * 1	!   2 * 0 	!   2 * 2  !
! portability check	! + 0 * 2	! + 0 * 1	! + 0 * 0  !
! OS check		! + 1 * 2	! + 1 * 2	! + 1 * 0  !
! fs check		! + 2 * 0	! + 2 * 0	! + 2 * 0  !
! performance hit	! + 2 * 0	! + 2 * 0	! + 2 * 2  !
! security illusion 	! + 1 * 0	! + 1 * 0	! + 1 * 2  !
! ease of use		! + 3 * 0	! + 3 * 2	! + 3 * 2  !
!------------------------------------------------------------------!
!  sum			!   4		!   8		!   16
portable users :
!			! portable_name !    native 	! no_check !
!------------------------------------------------------------------!
! explicit		!   1 * 1	!   1 * 0 	!   1 * 2  !
! portability check	! + 3 * 2	! + 3 * 1	! + 3 * 0  !
! OS check		! + 0 * 2	! + 0 * 2	! + 0 * 0  !
! fs check		! + 0 * 0	! + 0 * 0	! + 0 * 0  !
! performance hit	! + 2 * 0	! + 2 * 0	! + 2 * 2  !
! security illusion 	! + 1 * 0	! + 1 * 0	! + 1 * 2  !
! ease of use		! + 3 * 2	! + 3 * 0	! + 3 * 0  !
!------------------------------------------------------------------!
!  sum			!   13		!   3		!  8
We can now use the fact that 80% of the users should be common users. We 
can then try to calculate which option is to be used most widely :
portable_name 	= 0.8 * 4 + 0.2 * 13 =
                = 3.2 + 2.6
                = 5.8
native	= 0.8 * 8 + 0.2 * 3
        = 6.4 + 0.6
        = 7
no_check	= 0.8 * 16 + 0.2 * 8
                = 12.8 + 1.6
                = 14.4
We can then deduce that the no_check option suits the best to most of 
the common needs !
g - The native problem
In the 1.32 version of boost, the option native has the problem : the 
checks are done so that the path used will be accepted by the operating 
system used to compile not by the file system the application will access.
This is confusing, as by native, I would expect checks to work on the 
platform for which you compiled the program. I personally do not think 
about file systems. In my opinion, this is what confuses the user and 
gives a security illusion.
I read that Beman was working to change the behavior of the native 
option, so that it would also check on the file system. If we check what 
are the results doing so, we found that then the native option would fit 
the best.
!			! portable_name	!    native 	!    no_check !
!---------------------------------------------------------------------!
! explicit name		!	 +	!	++	!	++    !
! portability check	!	 ++	!	+	!	-     !
! OS check		!	 ++	!	++	!	-     !
! fs check		!	 -	!	++	!	-     !
! no performance hit	!	 -	!	-	!	++    !
! no security ill.	!	 -	!	+	!	++    !
common users :
! ease of use		!	 -	!	++	!	++    !
Portable users :
! ease of use		!	 ++	!	-	!	-     !
common users :
!			!     native 	!
!---------------------------------------!
! explicit		!    2 * 2 	!
! portability check	!  + 0 * 1	!
! OS check		!  + 1 * 2	!
! fs check		!  + 2 * 2	!
! performance hit	!  + 2 * 0	!
! security illusion 	!  + 1 * 1	!
! ease of use		!  + 3 * 2	!
!---------------------------------------!
!  sum			!    17		!
portable users :
!			!    native    !
!--------------------------------------!
! explicit		!   1 * 2      !
! portability check	! + 3 * 1      !
! OS check		! + 0 * 2      !
! fs check		! + 0 * 2      !
! performance hit	! + 2 * 0      !
! security illusion 	! + 1 * 1      !
! ease of use		! + 3 * 2      !
!--------------------------------------!
!  sum			!   15		
native	= 0.8 * 17 + 0.2 * 15
        = 13.6 + 3
        = 16.6
If the problem described, with multiple file systems mounted is solved, 
that would be the best choice, but for now, it would bring many mistakes 
due to misunderstanding how native works in my opinion.
g - limitations
This approach is not really objective, as I was the one choosing the 
weights of the different criteria, and those are just arbitrary choices. 
This is especially true for the explicit criteria or security illusion.
On the other side, it is an approach that has the advantage of 
"measuring" and giving a solution.
II - Some suggestions about native
The native option is quite confusing. I had to read through the mailing 
list before being able to understand how it works. The documentation 
does not explains in a really explicit way how it works. And I don't 
thing I am the only one, as Walter was also (See http://tinyurl.com/6esmw).
Ideas to solve that :
1 -> Give a more explicit name (OS_native ?).
2 -> Change the documentation so that this point is explained a bit more 
? Why not add an example ? The example of Beman Dawes what quite 
explicit about how that works !
I hope I didn't bored too many of you ! If you read this sentence, that 
is near to be a miracle ! :-)
Thank-you for having taken the time to read through this post !
Cheers,
Pierre-Andre Galmes