mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-11-04 05:50:31 -05:00 
			
		
		
		
	
		
			
				
	
	
		
			377 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			377 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
[template perf[name value] [value]]
 | 
						|
[template para[text] '''<para>'''[text]'''</para>''']
 | 
						|
 | 
						|
[mathpart perf Performance]
 | 
						|
 | 
						|
[section:perf_over2 Performance Overview]
 | 
						|
[performance_overview]
 | 
						|
[endsect]
 | 
						|
 | 
						|
[section:interp Interpreting these Results]
 | 
						|
 | 
						|
In all of the following tables, the best performing
 | 
						|
result in each row, is assigned a relative value of "1" and shown
 | 
						|
in bold, so a score of "2" means ['"twice as slow as the best
 | 
						|
performing result".]  Actual timings in nano-seconds per function call
 | 
						|
are also shown in parenthesis.  To make the results easier to read, they
 | 
						|
are color-coded as follows: the best result and everything within 20% of
 | 
						|
it is green, anything that's more than twice as slow as the best result is red,
 | 
						|
and results in between are blue.
 | 
						|
 | 
						|
Result were obtained on a system
 | 
						|
with an Intel core i7 4710MQ with 16Gb RAM and running
 | 
						|
either Windows 8.1 or Xubuntu Linux.
 | 
						|
 | 
						|
[caution As usual with performance results these should be taken with a large pinch
 | 
						|
of salt: relative performance is known to shift quite a bit depending
 | 
						|
upon the architecture of the particular test system used.  Further
 | 
						|
more, our performance results were obtained using our own test data:
 | 
						|
these test values are designed to provide good coverage of our code and test
 | 
						|
all the appropriate corner cases.  They do not necessarily represent
 | 
						|
"typical" usage: whatever that may be!
 | 
						|
]
 | 
						|
 | 
						|
[endsect]
 | 
						|
 | 
						|
[section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
 | 
						|
 | 
						|
By far the most important thing you can do when using this library
 | 
						|
is turn on your compiler's optimisation options.  As the following
 | 
						|
table shows the penalty for using the library in debug mode can be
 | 
						|
quite large.  In addition switching to 64-bit code has a small but noticeable
 | 
						|
improvement in performance, as does switching to a different compiler
 | 
						|
(Intel C++ 15 in this example).
 | 
						|
 | 
						|
[table_Compiler_Option_Comparison_on_Windows_x64]
 | 
						|
 | 
						|
[endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
 | 
						|
 | 
						|
[section:tradoffs Trading Accuracy for Performance]
 | 
						|
 | 
						|
There are a number of [link policy Policies] that can be used to trade accuracy for performance:
 | 
						|
 | 
						|
* Internal promotion: by default functions with `float` arguments are evaluated at `double` precision
 | 
						|
internally to ensure full precision in the result.  Similarly `double` precision functions are
 | 
						|
evaluated at `long double` precision internally by default.  Changing these defaults can have a significant
 | 
						|
speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in
 | 
						|
numerical instability for some of the more complex algorithms, we suggest you use this option with care.
 | 
						|
* Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want
 | 
						|
to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever
 | 
						|
is "good enough" for your particular use case.
 | 
						|
 | 
						|
For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you
 | 
						|
can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or
 | 
						|
at the point of call via something like this:
 | 
						|
 | 
						|
   double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>()));
 | 
						|
 | 
						|
However, an easier option might be:
 | 
						|
 | 
						|
   #include <boost/math/special_functions.hpp> // Or any individual special function header
 | 
						|
 | 
						|
   namespace math{
 | 
						|
 | 
						|
   namespace precise{
 | 
						|
   //
 | 
						|
   // Define a Policy for accurate evaluation - this is the same as the default, unless
 | 
						|
   // someone has changed the global defaults.
 | 
						|
   //
 | 
						|
   typedef boost::math::policies::policy<> accurate_policy;
 | 
						|
   //
 | 
						|
   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare
 | 
						|
   // functions that use the above policy.  Note no trailing
 | 
						|
   // ";" required on the macro call:
 | 
						|
   //
 | 
						|
   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy)
 | 
						|
 | 
						|
 | 
						|
   }
 | 
						|
 | 
						|
   namespace fast{
 | 
						|
   //
 | 
						|
   // Define a Policy for fast evaluation:
 | 
						|
   //
 | 
						|
   using namespace boost::math::polcies;
 | 
						|
   typedef policy<promote_double<false> > fast_policy;
 | 
						|
   //
 | 
						|
   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
 | 
						|
   //
 | 
						|
   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 | 
						|
 | 
						|
   }
 | 
						|
 | 
						|
   }
 | 
						|
 | 
						|
And now one can call:
 | 
						|
 | 
						|
   math::accurate::tgamma(x);
 | 
						|
 | 
						|
For the "accurate" version of tgamma, and:
 | 
						|
 | 
						|
   math::fast::tgamma(x);
 | 
						|
 | 
						|
For the faster version.
 | 
						|
 | 
						|
Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done:
 | 
						|
 | 
						|
   namespace math{
 | 
						|
   namespace fast{
 | 
						|
   //
 | 
						|
   // Define a Policy for fast evaluation:
 | 
						|
   //
 | 
						|
   using namespace boost::math::polcies;
 | 
						|
   typedef policy<promote_double<false>, digits10<9> > fast_policy;
 | 
						|
   //
 | 
						|
   // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
 | 
						|
   //
 | 
						|
   BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 | 
						|
 | 
						|
   }
 | 
						|
   }
 | 
						|
 | 
						|
One can do a similar thing with the distribution classes:
 | 
						|
 | 
						|
   #include <boost/math/distributions.hpp> // or any individual distribution header
 | 
						|
 | 
						|
   namespace math{ namespace fast{
 | 
						|
   //
 | 
						|
   // Define a policy for fastest possible evaluation:
 | 
						|
   //
 | 
						|
   using namespace boost::math::polcies;
 | 
						|
   typedef policy<promote_float<false> > fast_float_policy;
 | 
						|
   //
 | 
						|
   // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS
 | 
						|
   //
 | 
						|
   BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy)
 | 
						|
 | 
						|
   }} // namespaces
 | 
						|
 | 
						|
   //
 | 
						|
   // And use:
 | 
						|
   //
 | 
						|
   float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f);
 | 
						|
 | 
						|
Here's how these options change the relative performance of the distributions on Linux:
 | 
						|
 | 
						|
[table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_5_1_0_on_linux]
 | 
						|
 | 
						|
[endsect] [/section:tradoffs Trading Accuracy for Performance]
 | 
						|
 | 
						|
[section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 | 
						|
 | 
						|
Using user-defined floating-point like __multiprecision has a very high run-time cost.
 | 
						|
 | 
						|
To give some flavour of this:
 | 
						|
 | 
						|
[table:linpack_time Linpack Benchmark
 | 
						|
[[floating-point type]                            [speed Mflops]]
 | 
						|
[[double]                                                [2727]]
 | 
						|
[[__float128]                                          [35]]
 | 
						|
[[multiprecision::float128]                    [35]]
 | 
						|
[[multiprecision::cpp_bin_float_quad] [6]]
 | 
						|
]
 | 
						|
 | 
						|
[endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 | 
						|
 | 
						|
 | 
						|
[section:tuning Performance Tuning Macros]
 | 
						|
 | 
						|
There are a small number of performance tuning options
 | 
						|
that are determined by configuration macros.  These should be set
 | 
						|
in boost/math/tools/user.hpp; or else reported to the Boost-development
 | 
						|
mailing list so that the appropriate option for a given compiler and
 | 
						|
OS platform can be set automatically in our configuration setup.
 | 
						|
 | 
						|
[table
 | 
						|
[[Macro][Meaning]]
 | 
						|
[[BOOST_MATH_POLY_METHOD]
 | 
						|
   [Determines how polynomials and most rational functions
 | 
						|
   are evaluated.  Define to one
 | 
						|
   of the values 0, 1, 2 or 3: see below for the meaning of these values.]]
 | 
						|
[[BOOST_MATH_RATIONAL_METHOD]
 | 
						|
   [Determines how symmetrical rational functions are evaluated: mostly
 | 
						|
   this only effects how the Lanczos approximation is evaluated, and how
 | 
						|
   the `evaluate_rational` function behaves.  Define to one
 | 
						|
   of the values 0, 1, 2 or 3: see below for the meaning of these values.
 | 
						|
   ]]
 | 
						|
[[BOOST_MATH_MAX_POLY_ORDER]
 | 
						|
   [The maximum order of polynomial or rational function that will
 | 
						|
   be evaluated by a method other than 0 (a simple "for" loop).
 | 
						|
   ]]
 | 
						|
[[BOOST_MATH_INT_TABLE_TYPE(RT, IT)]
 | 
						|
   [Many of the coefficients to the polynomials and rational functions
 | 
						|
   used by this library are integers.  Normally these are stored as tables
 | 
						|
   as integers, but if mixed integer / floating point arithmetic is much
 | 
						|
   slower than regular floating point arithmetic then they can be stored
 | 
						|
   as tables of floating point values instead.  If mixed arithmetic is slow
 | 
						|
   then add:
 | 
						|
 | 
						|
      #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT
 | 
						|
 | 
						|
   to boost/math/tools/user.hpp, otherwise the default of:
 | 
						|
 | 
						|
      #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT
 | 
						|
 | 
						|
   Set in boost/math/config.hpp is fine, and may well result in smaller
 | 
						|
   code.
 | 
						|
   ]]
 | 
						|
]
 | 
						|
 | 
						|
The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD`
 | 
						|
may be set are as follows:
 | 
						|
 | 
						|
[table
 | 
						|
[[Value][Effect]]
 | 
						|
[[0][The polynomial or rational function is evaluated using Horner's
 | 
						|
      method, and a simple for-loop.
 | 
						|
 | 
						|
      Note that if the order of the polynomial
 | 
						|
      or rational function is a runtime parameter, or the order is
 | 
						|
      greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then
 | 
						|
      this method is always used, irrespective of the value
 | 
						|
      of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]]
 | 
						|
[[1][The polynomial or rational function is evaluated without
 | 
						|
      the use of a loop, and using Horner's method.  This only occurs
 | 
						|
      if the order of the polynomial is known at compile time and is less
 | 
						|
      than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]]
 | 
						|
[[2][The polynomial or rational function is evaluated without
 | 
						|
      the use of a loop, and using a second order Horner's method.
 | 
						|
      In theory this permits two operations to occur in parallel
 | 
						|
      for polynomials, and four in parallel for rational functions.
 | 
						|
      This only occurs
 | 
						|
      if the order of the polynomial is known at compile time and is less
 | 
						|
      than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 | 
						|
[[3][The polynomial or rational function is evaluated without
 | 
						|
      the use of a loop, and using a second order Horner's method.
 | 
						|
      In theory this permits two operations to occur in parallel
 | 
						|
      for polynomials, and four in parallel for rational functions.
 | 
						|
      This differs from method "2" in that the code is carefully ordered
 | 
						|
      to make the parallelisation more obvious to the compiler: rather than
 | 
						|
      relying on the compiler's optimiser to spot the parallelisation
 | 
						|
      opportunities.
 | 
						|
      This only occurs
 | 
						|
      if the order of the polynomial is known at compile time and is less
 | 
						|
      than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 | 
						|
]
 | 
						|
 | 
						|
The performance test suite generates a report for your particular compiler showing which method is likely to work best,
 | 
						|
the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux).  There's not much to choose between
 | 
						|
the various methods, but generally loop-unrolled methods perform better.  Interestingly, ordering the code
 | 
						|
to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).
 | 
						|
 | 
						|
[table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | 
						|
 | 
						|
[table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | 
						|
 | 
						|
[table_Polynomial_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | 
						|
 | 
						|
[table_Rational_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | 
						|
 | 
						|
[endsect] [/section:tuning Performance Tuning Macros]
 | 
						|
 | 
						|
[section:comp_compilers Comparing Different Compilers]
 | 
						|
 | 
						|
By running our performance test suite multiple times, we can compare the effect of different compilers: as
 | 
						|
might be expected, the differences are generally small compared to say disabling internal use of `long double`.
 | 
						|
However, there are still gains to be main, particularly from some of the commercial offerings:
 | 
						|
 | 
						|
[table_Compiler_Comparison_on_Windows_x64]
 | 
						|
 | 
						|
[table_Compiler_Comparison_on_linux]
 | 
						|
 | 
						|
[endsect] [/section:comp_compilers Comparing Different Compilers]
 | 
						|
 | 
						|
[section:comparisons Comparisons to Other Open Source Libraries]
 | 
						|
 | 
						|
We've run our performance tests both for our own code, and against other
 | 
						|
open source implementations of the same functions.  The results are
 | 
						|
presented below to give you a rough idea of how they all compare.
 | 
						|
In order to give a more-or-less level playing field our test data
 | 
						|
was screened against all the libraries being tested, and any
 | 
						|
unsupported domains removed, likewise for any test cases that gave large errors
 | 
						|
or unexpected non-finite values.
 | 
						|
 | 
						|
[caution
 | 
						|
You should exercise extreme caution when interpreting
 | 
						|
these results, relative performance may vary by platform, the tests use
 | 
						|
data that gives good code coverage of /our/ code, but which may skew the
 | 
						|
results towards the corner cases.  Finally, remember that different
 | 
						|
libraries make different choices with regard to performance verses
 | 
						|
numerical stability.
 | 
						|
]
 | 
						|
 | 
						|
The first results compare standard library functions to Boost equivalents with MSVC-14.0:
 | 
						|
 | 
						|
[table_Library_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | 
						|
 | 
						|
On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath:
 | 
						|
 | 
						|
[table_Library_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | 
						|
 | 
						|
And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB:
 | 
						|
 | 
						|
[table_Distribution_performance_comparison_with_GNU_C_version_5_1_0_on_linux]
 | 
						|
 | 
						|
[endsect] [/section:comparisons Comparisons to Other Open Source Libraries]
 | 
						|
 | 
						|
[section:perf_test_app The Performance Test Applications]
 | 
						|
 | 
						|
Under ['boost-path]\/libs\/math\/reporting\/performance you will find
 | 
						|
some reasonable comprehensive performance test applications for this library.
 | 
						|
 | 
						|
In order to generate the tables you will have seen in this documentation (or others
 | 
						|
for your specific compiler) you need to invoke `bjam` in this directory, using a C++11
 | 
						|
capable compiler.  Note that
 | 
						|
results extend/overwrite whatever is already present in
 | 
						|
['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk,
 | 
						|
you may want to delete this file before you begin so as to make a fresh start for
 | 
						|
your particular system.
 | 
						|
 | 
						|
The programs produce results in Boost's Quickbook format which is not terribly
 | 
						|
human readable.  If you configure your user-config.jam to be able to build Docbook
 | 
						|
documentation, then you will also get a full summary of all the data in HTML format
 | 
						|
in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html.  Assuming
 | 
						|
you're on a 'nix-like platform the procedure to do this is to first install the
 | 
						|
`xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages.  Then:
 | 
						|
 | 
						|
* Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory.
 | 
						|
* Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!)
 | 
						|
This assumes that `xsltproc` is in your path.
 | 
						|
* Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file.  The `path-to-dtd` should point
 | 
						|
to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets.
 | 
						|
Both paths should use all forward slashes even on Windows.
 | 
						|
 | 
						|
At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are
 | 
						|
present in the compilers path they will be automatically tested.  For DCDFLIB you will need to place the C
 | 
						|
source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib.
 | 
						|
 | 
						|
If you want to compare multiple compilers, or multiple options for one compiler, then you will
 | 
						|
need to invoke `bjam` multiple times, once for each compiler.  Note that in order to test
 | 
						|
multiple configurations of the same compiler, each has to be given a unique name in the test
 | 
						|
program, otherwise they all edit the same table cells.  Suppose you want to test GCC with
 | 
						|
and without the -ffast-math option, in this case bjam would be invoked first as:
 | 
						|
 | 
						|
   bjam toolset=gcc -a cxxflags=-std=gnu++11
 | 
						|
 | 
						|
Which would run the tests using default optimization options (-O3), we can then run again
 | 
						|
using -ffast-math:
 | 
						|
 | 
						|
   bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"'
 | 
						|
 | 
						|
In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set
 | 
						|
to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the
 | 
						|
compiler.
 | 
						|
 | 
						|
[endsect] [/section:perf_test_app The Performance Test Applications]
 | 
						|
 | 
						|
[endmathpart]
 | 
						|
 | 
						|
[/
 | 
						|
  Copyright 2006 John Maddock and Paul A. Bristow.
 | 
						|
  Distributed under the Boost Software License, Version 1.0.
 | 
						|
  (See accompanying file LICENSE_1_0.txt or copy at
 | 
						|
  http://www.boost.org/LICENSE_1_0.txt).
 | 
						|
]
 | 
						|
 | 
						|
 |