mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-10-24 17:40:26 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			377 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			377 lines
		
	
	
		
			16 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| [template perf[name value] [value]]
 | |
| [template para[text] '''<para>'''[text]'''</para>''']
 | |
| 
 | |
| [mathpart perf Performance]
 | |
| 
 | |
| [section:perf_over2 Performance Overview]
 | |
| [performance_overview]
 | |
| [endsect]
 | |
| 
 | |
| [section:interp Interpreting these Results]
 | |
| 
 | |
| In all of the following tables, the best performing
 | |
| result in each row, is assigned a relative value of "1" and shown
 | |
| in bold, so a score of "2" means ['"twice as slow as the best
 | |
| performing result".]  Actual timings in nano-seconds per function call
 | |
| are also shown in parenthesis.  To make the results easier to read, they
 | |
| are color-coded as follows: the best result and everything within 20% of
 | |
| it is green, anything that's more than twice as slow as the best result is red,
 | |
| and results in between are blue.
 | |
| 
 | |
| Result were obtained on a system
 | |
| with an Intel core i7 4710MQ with 16Gb RAM and running
 | |
| either Windows 8.1 or Xubuntu Linux.
 | |
| 
 | |
| [caution As usual with performance results these should be taken with a large pinch
 | |
| of salt: relative performance is known to shift quite a bit depending
 | |
| upon the architecture of the particular test system used.  Further
 | |
| more, our performance results were obtained using our own test data:
 | |
| these test values are designed to provide good coverage of our code and test
 | |
| all the appropriate corner cases.  They do not necessarily represent
 | |
| "typical" usage: whatever that may be!
 | |
| ]
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
 | |
| 
 | |
| By far the most important thing you can do when using this library
 | |
| is turn on your compiler's optimisation options.  As the following
 | |
| table shows the penalty for using the library in debug mode can be
 | |
| quite large.  In addition switching to 64-bit code has a small but noticeable
 | |
| improvement in performance, as does switching to a different compiler
 | |
| (Intel C++ 15 in this example).
 | |
| 
 | |
| [table_Compiler_Option_Comparison_on_Windows_x64]
 | |
| 
 | |
| [endsect] [/section:getting_best Getting the Best Performance from this Library: Compiler and Compiler Options]
 | |
| 
 | |
| [section:tradoffs Trading Accuracy for Performance]
 | |
| 
 | |
| There are a number of [link policy Policies] that can be used to trade accuracy for performance:
 | |
| 
 | |
| * Internal promotion: by default functions with `float` arguments are evaluated at `double` precision
 | |
| internally to ensure full precision in the result.  Similarly `double` precision functions are
 | |
| evaluated at `long double` precision internally by default.  Changing these defaults can have a significant
 | |
| speed advantage at the expense of accuracy, note also that evaluating using `float` internally may result in
 | |
| numerical instability for some of the more complex algorithms, we suggest you use this option with care.
 | |
| * Target accuracy: just because you choose to evaluate at `double` precision doesn't mean you necessarily want
 | |
| to target full 16-digit accuracy, if you wish you can change the default (full machine precision) to whatever
 | |
| is "good enough" for your particular use case.
 | |
| 
 | |
| For example, suppose you want to evaluate `double` precision functions at `double` precision internally, you
 | |
| can change the global default by passing `-DBOOST_MATH_PROMOTE_DOUBLE_POLICY=false` on the command line, or
 | |
| at the point of call via something like this:
 | |
| 
 | |
|    double val = boost::math::erf(my_argument, boost::math::policies::make_policy(boost::math::policies::promote_double<false>()));
 | |
| 
 | |
| However, an easier option might be:
 | |
| 
 | |
|    #include <boost/math/special_functions.hpp> // Or any individual special function header
 | |
| 
 | |
|    namespace math{
 | |
| 
 | |
|    namespace precise{
 | |
|    //
 | |
|    // Define a Policy for accurate evaluation - this is the same as the default, unless
 | |
|    // someone has changed the global defaults.
 | |
|    //
 | |
|    typedef boost::math::policies::policy<> accurate_policy;
 | |
|    //
 | |
|    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS to declare
 | |
|    // functions that use the above policy.  Note no trailing
 | |
|    // ";" required on the macro call:
 | |
|    //
 | |
|    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(accurate_policy)
 | |
| 
 | |
| 
 | |
|    }
 | |
| 
 | |
|    namespace fast{
 | |
|    //
 | |
|    // Define a Policy for fast evaluation:
 | |
|    //
 | |
|    using namespace boost::math::polcies;
 | |
|    typedef policy<promote_double<false> > fast_policy;
 | |
|    //
 | |
|    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
 | |
|    //
 | |
|    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 | |
| 
 | |
|    }
 | |
| 
 | |
|    }
 | |
| 
 | |
| And now one can call:
 | |
| 
 | |
|    math::accurate::tgamma(x);
 | |
| 
 | |
| For the "accurate" version of tgamma, and:
 | |
| 
 | |
|    math::fast::tgamma(x);
 | |
| 
 | |
| For the faster version.
 | |
| 
 | |
| Had we wished to change the target precision (to 9 decimal places) as well as the evaluation type used, we might have done:
 | |
| 
 | |
|    namespace math{
 | |
|    namespace fast{
 | |
|    //
 | |
|    // Define a Policy for fast evaluation:
 | |
|    //
 | |
|    using namespace boost::math::polcies;
 | |
|    typedef policy<promote_double<false>, digits10<9> > fast_policy;
 | |
|    //
 | |
|    // Invoke BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS:
 | |
|    //
 | |
|    BOOST_MATH_DECLARE_SPECIAL_FUNCTIONS(fast_policy)
 | |
| 
 | |
|    }
 | |
|    }
 | |
| 
 | |
| One can do a similar thing with the distribution classes:
 | |
| 
 | |
|    #include <boost/math/distributions.hpp> // or any individual distribution header
 | |
| 
 | |
|    namespace math{ namespace fast{
 | |
|    //
 | |
|    // Define a policy for fastest possible evaluation:
 | |
|    //
 | |
|    using namespace boost::math::polcies;
 | |
|    typedef policy<promote_float<false> > fast_float_policy;
 | |
|    //
 | |
|    // Invoke BOOST_MATH_DECLARE_DISTRIBUTIONS
 | |
|    //
 | |
|    BOOST_MATH_DECLARE_DISTRIBUTIONS(float, fast_float_policy)
 | |
| 
 | |
|    }} // namespaces
 | |
| 
 | |
|    //
 | |
|    // And use:
 | |
|    //
 | |
|    float p_val = cdf(math::fast::normal(1.0f, 3.0f), 0.25f);
 | |
| 
 | |
| Here's how these options change the relative performance of the distributions on Linux:
 | |
| 
 | |
| [table_Distribution_performance_comparison_for_different_performance_options_with_GNU_C_version_5_1_0_on_linux]
 | |
| 
 | |
| [endsect] [/section:tradoffs Trading Accuracy for Performance]
 | |
| 
 | |
| [section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 | |
| 
 | |
| Using user-defined floating-point like __multiprecision has a very high run-time cost.
 | |
| 
 | |
| To give some flavour of this:
 | |
| 
 | |
| [table:linpack_time Linpack Benchmark
 | |
| [[floating-point type]                            [speed Mflops]]
 | |
| [[double]                                                [2727]]
 | |
| [[__float128]                                          [35]]
 | |
| [[multiprecision::float128]                    [35]]
 | |
| [[multiprecision::cpp_bin_float_quad] [6]]
 | |
| ]
 | |
| 
 | |
| [endsect] [/section:multiprecision Cost of High-Precision Non-built-in Floating-point]
 | |
| 
 | |
| 
 | |
| [section:tuning Performance Tuning Macros]
 | |
| 
 | |
| There are a small number of performance tuning options
 | |
| that are determined by configuration macros.  These should be set
 | |
| in boost/math/tools/user.hpp; or else reported to the Boost-development
 | |
| mailing list so that the appropriate option for a given compiler and
 | |
| OS platform can be set automatically in our configuration setup.
 | |
| 
 | |
| [table
 | |
| [[Macro][Meaning]]
 | |
| [[BOOST_MATH_POLY_METHOD]
 | |
|    [Determines how polynomials and most rational functions
 | |
|    are evaluated.  Define to one
 | |
|    of the values 0, 1, 2 or 3: see below for the meaning of these values.]]
 | |
| [[BOOST_MATH_RATIONAL_METHOD]
 | |
|    [Determines how symmetrical rational functions are evaluated: mostly
 | |
|    this only effects how the Lanczos approximation is evaluated, and how
 | |
|    the `evaluate_rational` function behaves.  Define to one
 | |
|    of the values 0, 1, 2 or 3: see below for the meaning of these values.
 | |
|    ]]
 | |
| [[BOOST_MATH_MAX_POLY_ORDER]
 | |
|    [The maximum order of polynomial or rational function that will
 | |
|    be evaluated by a method other than 0 (a simple "for" loop).
 | |
|    ]]
 | |
| [[BOOST_MATH_INT_TABLE_TYPE(RT, IT)]
 | |
|    [Many of the coefficients to the polynomials and rational functions
 | |
|    used by this library are integers.  Normally these are stored as tables
 | |
|    as integers, but if mixed integer / floating point arithmetic is much
 | |
|    slower than regular floating point arithmetic then they can be stored
 | |
|    as tables of floating point values instead.  If mixed arithmetic is slow
 | |
|    then add:
 | |
| 
 | |
|       #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) RT
 | |
| 
 | |
|    to boost/math/tools/user.hpp, otherwise the default of:
 | |
| 
 | |
|       #define BOOST_MATH_INT_TABLE_TYPE(RT, IT) IT
 | |
| 
 | |
|    Set in boost/math/config.hpp is fine, and may well result in smaller
 | |
|    code.
 | |
|    ]]
 | |
| ]
 | |
| 
 | |
| The values to which `BOOST_MATH_POLY_METHOD` and `BOOST_MATH_RATIONAL_METHOD`
 | |
| may be set are as follows:
 | |
| 
 | |
| [table
 | |
| [[Value][Effect]]
 | |
| [[0][The polynomial or rational function is evaluated using Horner's
 | |
|       method, and a simple for-loop.
 | |
| 
 | |
|       Note that if the order of the polynomial
 | |
|       or rational function is a runtime parameter, or the order is
 | |
|       greater than the value of `BOOST_MATH_MAX_POLY_ORDER`, then
 | |
|       this method is always used, irrespective of the value
 | |
|       of `BOOST_MATH_POLY_METHOD` or `BOOST_MATH_RATIONAL_METHOD`.]]
 | |
| [[1][The polynomial or rational function is evaluated without
 | |
|       the use of a loop, and using Horner's method.  This only occurs
 | |
|       if the order of the polynomial is known at compile time and is less
 | |
|       than or equal to `BOOST_MATH_MAX_POLY_ORDER`. ]]
 | |
| [[2][The polynomial or rational function is evaluated without
 | |
|       the use of a loop, and using a second order Horner's method.
 | |
|       In theory this permits two operations to occur in parallel
 | |
|       for polynomials, and four in parallel for rational functions.
 | |
|       This only occurs
 | |
|       if the order of the polynomial is known at compile time and is less
 | |
|       than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 | |
| [[3][The polynomial or rational function is evaluated without
 | |
|       the use of a loop, and using a second order Horner's method.
 | |
|       In theory this permits two operations to occur in parallel
 | |
|       for polynomials, and four in parallel for rational functions.
 | |
|       This differs from method "2" in that the code is carefully ordered
 | |
|       to make the parallelisation more obvious to the compiler: rather than
 | |
|       relying on the compiler's optimiser to spot the parallelisation
 | |
|       opportunities.
 | |
|       This only occurs
 | |
|       if the order of the polynomial is known at compile time and is less
 | |
|       than or equal to `BOOST_MATH_MAX_POLY_ORDER`.]]
 | |
| ]
 | |
| 
 | |
| The performance test suite generates a report for your particular compiler showing which method is likely to work best,
 | |
| the following tables show the results for MSVC-14.0 and GCC-5.1.0 (Linux).  There's not much to choose between
 | |
| the various methods, but generally loop-unrolled methods perform better.  Interestingly, ordering the code
 | |
| to try and "second guess" possible optimizations seems not to be such a good idea (method 3 below).
 | |
| 
 | |
| [table_Polynomial_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | |
| 
 | |
| [table_Rational_Method_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | |
| 
 | |
| [table_Polynomial_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | |
| 
 | |
| [table_Rational_Method_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | |
| 
 | |
| [endsect] [/section:tuning Performance Tuning Macros]
 | |
| 
 | |
| [section:comp_compilers Comparing Different Compilers]
 | |
| 
 | |
| By running our performance test suite multiple times, we can compare the effect of different compilers: as
 | |
| might be expected, the differences are generally small compared to say disabling internal use of `long double`.
 | |
| However, there are still gains to be main, particularly from some of the commercial offerings:
 | |
| 
 | |
| [table_Compiler_Comparison_on_Windows_x64]
 | |
| 
 | |
| [table_Compiler_Comparison_on_linux]
 | |
| 
 | |
| [endsect] [/section:comp_compilers Comparing Different Compilers]
 | |
| 
 | |
| [section:comparisons Comparisons to Other Open Source Libraries]
 | |
| 
 | |
| We've run our performance tests both for our own code, and against other
 | |
| open source implementations of the same functions.  The results are
 | |
| presented below to give you a rough idea of how they all compare.
 | |
| In order to give a more-or-less level playing field our test data
 | |
| was screened against all the libraries being tested, and any
 | |
| unsupported domains removed, likewise for any test cases that gave large errors
 | |
| or unexpected non-finite values.
 | |
| 
 | |
| [caution
 | |
| You should exercise extreme caution when interpreting
 | |
| these results, relative performance may vary by platform, the tests use
 | |
| data that gives good code coverage of /our/ code, but which may skew the
 | |
| results towards the corner cases.  Finally, remember that different
 | |
| libraries make different choices with regard to performance verses
 | |
| numerical stability.
 | |
| ]
 | |
| 
 | |
| The first results compare standard library functions to Boost equivalents with MSVC-14.0:
 | |
| 
 | |
| [table_Library_Comparison_with_Microsoft_Visual_C_version_14_0_on_Windows_x64]
 | |
| 
 | |
| On Linux with GCC, we can also compare to the TR1 functions, and to GSL and RMath:
 | |
| 
 | |
| [table_Library_Comparison_with_GNU_C_version_5_1_0_on_linux]
 | |
| 
 | |
| And finally we can compare the statistical distributions to GSL, RMath and DCDFLIB:
 | |
| 
 | |
| [table_Distribution_performance_comparison_with_GNU_C_version_5_1_0_on_linux]
 | |
| 
 | |
| [endsect] [/section:comparisons Comparisons to Other Open Source Libraries]
 | |
| 
 | |
| [section:perf_test_app The Performance Test Applications]
 | |
| 
 | |
| Under ['boost-path]\/libs\/math\/reporting\/performance you will find
 | |
| some reasonable comprehensive performance test applications for this library.
 | |
| 
 | |
| In order to generate the tables you will have seen in this documentation (or others
 | |
| for your specific compiler) you need to invoke `bjam` in this directory, using a C++11
 | |
| capable compiler.  Note that
 | |
| results extend/overwrite whatever is already present in
 | |
| ['boost-path]\/libs\/math\/reporting\/performance\/doc\/performance_tables.qbk,
 | |
| you may want to delete this file before you begin so as to make a fresh start for
 | |
| your particular system.
 | |
| 
 | |
| The programs produce results in Boost's Quickbook format which is not terribly
 | |
| human readable.  If you configure your user-config.jam to be able to build Docbook
 | |
| documentation, then you will also get a full summary of all the data in HTML format
 | |
| in ['boost-path]\/libs\/math\/reporting\/performance\/html\/index.html.  Assuming
 | |
| you're on a 'nix-like platform the procedure to do this is to first install the
 | |
| `xsltproc`, `Docbook DTD`, and `Bookbook XSL` packages.  Then:
 | |
| 
 | |
| * Copy ['boost-path]\/tools\/build\/example\/user-config.jam to your home directory.
 | |
| * Add `using xsltproc ;` to the end of the file (note the space surrounding each token, including the final ";", this is important!)
 | |
| This assumes that `xsltproc` is in your path.
 | |
| * Add `using boostbook : path-to-xsl-stylesheets : path-to-dtd ;` to the end of the file.  The `path-to-dtd` should point
 | |
| to version 4.2.x of the Docbook DTD, while `path-to-xsl-stylesheets` should point to the folder containing the latest XSLT stylesheets.
 | |
| Both paths should use all forward slashes even on Windows.
 | |
| 
 | |
| At this point you should be able to run the tests and generate the HTML summary, if GSL, RMath or libstdc++ are
 | |
| present in the compilers path they will be automatically tested.  For DCDFLIB you will need to place the C
 | |
| source in ['boost-path]\/libs\/math\/reporting\/performance\/third_party\/dcdflib.
 | |
| 
 | |
| If you want to compare multiple compilers, or multiple options for one compiler, then you will
 | |
| need to invoke `bjam` multiple times, once for each compiler.  Note that in order to test
 | |
| multiple configurations of the same compiler, each has to be given a unique name in the test
 | |
| program, otherwise they all edit the same table cells.  Suppose you want to test GCC with
 | |
| and without the -ffast-math option, in this case bjam would be invoked first as:
 | |
| 
 | |
|    bjam toolset=gcc -a cxxflags=-std=gnu++11
 | |
| 
 | |
| Which would run the tests using default optimization options (-O3), we can then run again
 | |
| using -ffast-math:
 | |
| 
 | |
|    bjam toolset=gcc -a cxxflags='-std=gnu++11 -ffast-math' define=COMPILER_NAME='"GCC with -ffast-math"'
 | |
| 
 | |
| In the command line above, the -a flag forces a full rebuild, and the preprocessor define COMPILER_NAME needs to be set
 | |
| to a string literal describing the compiler configuration, hence the double quotes - one for the command line, one for the
 | |
| compiler.
 | |
| 
 | |
| [endsect] [/section:perf_test_app The Performance Test Applications]
 | |
| 
 | |
| [endmathpart]
 | |
| 
 | |
| [/
 | |
|   Copyright 2006 John Maddock and Paul A. Bristow.
 | |
|   Distributed under the Boost Software License, Version 1.0.
 | |
|   (See accompanying file LICENSE_1_0.txt or copy at
 | |
|   http://www.boost.org/LICENSE_1_0.txt).
 | |
| ]
 | |
| 
 | |
| 
 |