$include_dir="/home/hyper-archives/boost/include"; include("$include_dir/msg-header.inc") ?>
From: Martin Wille (mw8329_at_[hidden])
Date: 2005-03-07 17:20:05
David Abrahams wrote:
> "Victor A. Wagner Jr." <vawjr_at_[hidden]> writes:
> 
> 
>>At Sunday 2005-03-06 18:52, you wrote:
>>
>>>Let's start revving up to release Boost 1.33.0. Personally, I'd like
>>>to get it out the door by mid-April at the latest, and I'm offering
>>>to manage this release.
>>
>>thank you for your offer, but if you don't get the damned regression 
> 
> 
> Please keep your language civil.
> 
> 
>>testing working FIRST (it's been non-responsive
> 
> 
> Can you please be more specific about what has been non-responsive?  I
> doubt anyone can fix anything without more information.
Whatever tone might be appropriate or not ...
Several testers have raised issues and plead for better communication 
several (probably many) times. Most of the time, we seem to get ignored, 
unfortunately. I don't want to accuse anyone of voluntarily neglecting 
our concerns. However, I think we apparently suffer from a "testing is 
not too well understood" problem at several levels.
The tool chain employed for testing is very complex (due to the 
diversity of compilers and operation systems involved) and too fragile.
Complexity leads to lack of understanding (among the testers and among 
the library developers) and to false assumptions and to lack of 
communication. It additionally causes long delays between changing code 
and running the tests and between running the tests and the result being 
rendered. This in turn makes isolating bugs in the libraries more 
difficult. Fragility leads to the testing procedure breaking often and 
to breaking without getting noticed for some time and to breaking 
without anyone being able to recognize immediately exactly what part 
broke. This is a very unpleasant situation for anyone involved and it 
causes a significant level of frustration at least among those who run 
the tests (e.g. to see the own test results not being rendered for 
severals days or to see the test system being abused as a change 
announcement system isn't exactly motivating).
Please, understand that a lot of resources (human and computers) are 
wasted due to these problems. This waste is most apparent those who run 
the tests. However, most of the time, issues raised by the testers 
seemed to get ignored. Maybe, that was just because we didn't yell loud 
enough or we didn't know whom to address or how to fix the problems.
Personally, I don't have any problem with the words Victor chose. Other 
people might have. If you're one of them, then please understand that 
we're feeling there's something going very wrong with the testing 
procedure and we're afraid it will go on that way and we'll lose a lot 
of the quality (and the reputation) Boost has.
The people involved in creating the test procedure have put very much 
effort in it and the resulting system does its job nicely when it 
happens to work correctly. However, apparently, the overall complexity 
of the testing procedure has grown above our management capabilities.
This is one reason why release preparations take so long.
Maybe, we should take a step back and collect all the issues we have and 
all knowledge about what is causing these issues.
I'll make a start, I hope others will contribute to the list.
Issues and causes unordered (please, excuse any duplicates):
- testing takes a huge amount of resources (HD, CPU, RAM, people 
operating the test systems, people operating the result rendering 
systems, people coding the test post processing tools, people finding 
the bugs in the testing system)
- the testing procedure is complex
- the testing procedure is fragile
- the code-change to result-rendering process takes too long
- bugs in the testing procedure take too long to get fixed
- changes to code that will affect the testing procedure aren't 
communicated well
- incremental testing doesn't work flawlessly
- deleting tests requires manual purging of old results in an 
incremental testing environment.
- the number of target systems for testing is rather low; this results 
in questionable portability.
- lousy performance of Sourceforge
- resource limitations at Sourceforge (e.g. the number of files there)
- between releases the testing system isn't as well maintained as during 
the release preparations.
- test results aren't easily reproducible. They depend much on the 
components on the respective testing systems (e.g. glibc version, system 
compiler version, python version, kernel version and even on the 
processor used on Linux)
- library maintainers don't have access to the testing systems; this 
results in longer test-fix cycles.
- changes which will cause heavy load at the testing sites never get 
announced in advance. This is a problem when testing resources have to 
be shared with the normal workload (like in my case).
- changes that requires old test results to get purged usually don't get 
announced.
- becoming a new contributor for testing resources is too difficult.
- we're supporting compilers that compile languages significantly 
different from C++.
- there's no common concept of which compilers to support and which not.
- post-release displaying of test results apparently takes too much 
effort. Otherwise, it would have been done.
- tests are run for compilers for which they are known to fail. 100% 
waste of resources here.
- known-to-fail tests are rerun although the dependencies didn't change.
- some tests are insanely big.
- some library maintainers feel the need to run their own tests 
regularly. Ideally, this shouldn't be necessary.
- test post processing has to work on output from different compilers. 
Naturally, that output is formatted differently.
- test post processing makes use of very recent XSLT features.
- several times the post processing broke due to problems with the XSLT 
processor.
- XSLT processing takes long (merging all the components that are input 
to the result rendering takes ~1 hour just for the tests I run)
- the number of tests is growing
- there's no way of testing experimental changes to core libraries 
without causing reruns of most tests (imagine someone would want to test 
an experimental version of some part of MPL).
- switching between CVS branches during release preparations takes 
additional resources and requires manual intervention.
I'm sure testers and library developers are able to add a lot more  to 
the list.
Regards,
m