mirror of
				https://github.com/saitohirga/WSJT-X.git
				synced 2025-10-31 04:50:34 -04:00 
			
		
		
		
	
		
			
				
	
	
		
			267 lines
		
	
	
		
			9.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
			
		
		
	
	
			267 lines
		
	
	
		
			9.9 KiB
		
	
	
	
		
			Plaintext
		
	
	
	
	
	
| [/============================================================================
 | |
|   Boost.odeint
 | |
| 
 | |
|   Copyright 2013 Karsten Ahnert
 | |
|   Copyright 2013 Pascal Germroth
 | |
|   Copyright 2013 Mario Mulansky
 | |
| 
 | |
|   Use, modification and distribution is subject to the Boost Software License,
 | |
|   Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
 | |
|   http://www.boost.org/LICENSE_1_0.txt)
 | |
| =============================================================================/]
 | |
| 
 | |
| 
 | |
| [section Parallel computation with OpenMP and MPI]
 | |
| 
 | |
| Parallelization is a key feature for modern numerical libraries due to the vast
 | |
| availability of many cores nowadays, even on Laptops.
 | |
| odeint currently supports parallelization with OpenMP and MPI, as described in
 | |
| the following sections.
 | |
| However, it should be made clear from the beginning that the difficulty of
 | |
| efficiently distributing ODE integration on many cores/machines lies in the
 | |
| parallelization of the system function, which is still the user's
 | |
| responsibility.
 | |
| Simply using a parallel odeint backend without parallelizing the system function
 | |
| will bring you almost no performance gains.
 | |
| 
 | |
| [section OpenMP]
 | |
| 
 | |
| [import ../examples/openmp/phase_chain.cpp]
 | |
| 
 | |
| odeint's OpenMP support is implemented as an external backend, which needs to be
 | |
| manually included. Depending on the compiler some additional flags may be
 | |
| needed, i.e. [^-fopenmp] for GCC.
 | |
| [phase_chain_openmp_header]
 | |
| 
 | |
| In the easiest parallelization approach with OpenMP we use a standard `vector`
 | |
| as the state type:
 | |
| [phase_chain_vector_state]
 | |
| 
 | |
| We initialize the state with some random data:
 | |
| [phase_chain_init]
 | |
| 
 | |
| Now we have to configure the stepper to use the OpenMP backend.
 | |
| This is done by explicitly providing the `openmp_range_algebra` as a template
 | |
| parameter to the stepper.
 | |
| This algebra requires the state type to be a model of Random Access Range and
 | |
| will be used from multiple threads by the algebra.
 | |
| [phase_chain_stepper]
 | |
| 
 | |
| Additional to providing the stepper with OpenMP parallelization we also need
 | |
| a parallelized system function to exploit the available cores.
 | |
| Here this is shown for a simple one-dimensional chain of phase oscillators with
 | |
| nearest neighbor coupling:
 | |
| [phase_chain_rhs]
 | |
| 
 | |
| [note In the OpenMP backends the system function will always be called
 | |
| sequentially from the thread used to start the integration.]
 | |
| 
 | |
| Finally, we perform the integration by using one of the integrate functions from
 | |
| odeint.
 | |
| As you can see, the parallelization is completely hidden in the stepper and the
 | |
| system function.
 | |
| OpenMP will take care of distributing the work among the threads and join them
 | |
| automatically.
 | |
| [phase_chain_integrate]
 | |
| 
 | |
| After integrating, the data can be accessed immediately and be processed
 | |
| further.
 | |
| Note, that you can specify the OpenMP scheduling by calling `omp_set_schedule`
 | |
| in the beginning of your program:
 | |
| [phase_chain_scheduling]
 | |
| 
 | |
| See [github_link examples/openmp/phase_chain.cpp
 | |
| openmp/phase_chain.cpp] for the complete example.
 | |
| 
 | |
| [heading Split state]
 | |
| 
 | |
| [import ../examples/openmp/phase_chain_omp_state.cpp]
 | |
| 
 | |
| For advanced cases odeint offers another approach to use OpenMP that allows for
 | |
| a more exact control of the parallelization.
 | |
| For example, for odd-sized data where OpenMP's thread boundaries don't match
 | |
| cache lines and hurt performance it might be advisable to copy the data from the
 | |
| continuous `vector<T>` into separate, individually aligned, vectors.
 | |
| For this, odeint provides the `openmp_state<T>` type, essentially an alias for
 | |
| `vector<vector<T>>`.
 | |
| 
 | |
| Here, the initialization is done with a `vector<double>`, but then we use
 | |
| odeint's `split` function to fill an `openmp_state`.
 | |
| The splitting is done such that the sizes of the individual regions differ at
 | |
| most by 1 to make the computation as uniform as possible.
 | |
| [phase_chain_state_init]
 | |
| 
 | |
| Of course, the system function has to be changed to deal with the
 | |
| `openmp_state`.
 | |
| Note that each sub-region of the state is computed in a single task, but at the
 | |
| borders read access to the neighbouring regions is required.
 | |
| [phase_chain_state_rhs]
 | |
| 
 | |
| Using the `openmp_state<T>` state type automatically selects `openmp_algebra`
 | |
| which executes odeint's internal computations on parallel regions.
 | |
| Hence, no manual configuration of the stepper is necessary.
 | |
| At the end of the integration, we use `unsplit` to concatenate the sub-regions
 | |
| back together into a single vector.
 | |
| [phase_chain_state_integrate]
 | |
| 
 | |
| [note You don't actually need to use `openmp_state<T>` for advanced use cases,
 | |
| `openmp_algebra` is simply an alias for `openmp_nested_algebra<range_algebra>`
 | |
| and supports any model of Random Access Range as the outer, parallel state type,
 | |
| and will use the given algebra on its elements.]
 | |
| 
 | |
| See [github_link examples/openmp/phase_chain_omp_state.cpp
 | |
| openmp/phase_chain_omp_state.cpp] for the complete example.
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [section MPI]
 | |
| 
 | |
| [import ../examples/mpi/phase_chain.cpp]
 | |
| 
 | |
| To expand the parallel computation across multiple machines we can use MPI.
 | |
| 
 | |
| The system function implementation is similar to the OpenMP variant with split
 | |
| data, the main difference being that while OpenMP uses a spawn/join model where
 | |
| everything not explicitly paralleled is only executed in the main thread, in
 | |
| MPI's model each node enters the `main()` method independently, diverging based
 | |
| on its rank and synchronizing through message-passing and explicit barriers.
 | |
| 
 | |
| odeint's MPI support is implemented as an external backend, too.
 | |
| Depending on the MPI implementation the code might need to be compiled with i.e.
 | |
| [^mpic++].
 | |
| [phase_chain_mpi_header]
 | |
| 
 | |
| Instead of reading another thread's data, we asynchronously send and receive the
 | |
| relevant data from neighbouring nodes, performing some computation in the interim
 | |
| to hide the latency.
 | |
| [phase_chain_mpi_rhs]
 | |
| 
 | |
| Analogous to `openmp_state<T>` we use `mpi_state< InnerState<T> >`, which
 | |
| automatically selects `mpi_nested_algebra` and the appropriate MPI-oblivious
 | |
| inner algebra (since our inner state is a `vector`, the inner algebra will be
 | |
| `range_algebra` as in the OpenMP example).
 | |
| [phase_chain_state]
 | |
| 
 | |
| In the main program we construct a `communicator` which tells us the `size` of
 | |
| the cluster and the current node's `rank` within that.
 | |
| We generate the input data on the master node only, avoiding unnecessary work on
 | |
| the other nodes.
 | |
| Instead of simply copying chunks, `split` acts as a MPI collective function here
 | |
| and sends/receives regions from master to each slave.
 | |
| The input argument is ignored on the slaves, but the master node receives
 | |
| a region in its output and will participate in the computation.
 | |
| [phase_chain_mpi_init]
 | |
| 
 | |
| Now that `x_split` contains (only) the local chunk for each node, we start the
 | |
| integration.
 | |
| 
 | |
| To print the result on the master node, we send the processed data back using
 | |
| `unsplit`.
 | |
| [phase_chain_mpi_integrate]
 | |
| 
 | |
| [note `mpi_nested_algebra::for_each`[~N] doesn't use any MPI constructs, it
 | |
| simply calls the inner algebra on the local chunk and the system function is not
 | |
| guarded by any barriers either, so if you don't manually place any (for example
 | |
| in parameter studies cases where the elements are completely independent) you
 | |
| might see the nodes diverging, returning from this call at different times.]
 | |
| 
 | |
| See [github_link examples/mpi/phase_chain.cpp
 | |
| mpi/phase_chain.cpp] for the complete example.
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [section Concepts]
 | |
| 
 | |
| [section MPI State]
 | |
| As used by `mpi_nested_algebra`.
 | |
| [heading Notation]
 | |
| [variablelist
 | |
|     [[`InnerState`] [The inner state type]]
 | |
|     [[`State`] [The MPI-state type]]
 | |
|     [[`state`] [Object of type `State`]]
 | |
|     [[`world`] [Object of type `boost::mpi::communicator`]]
 | |
| ]
 | |
| [heading Valid Expressions]
 | |
| [table
 | |
|     [[Name] [Expression] [Type] [Semantics]]
 | |
|     [[Construct a state with a communicator]
 | |
|      [`State(world)`] [`State`] [Constructs the State.]]
 | |
|     [[Construct a state with the default communicator]
 | |
|      [`State()`] [`State`] [Constructs the State.]]
 | |
|     [[Get the current node's inner state]
 | |
|      [`state()`] [`InnerState`] [Returns a (const) reference.]]
 | |
|     [[Get the communicator]
 | |
|      [`state.world`] [`boost::mpi::communicator`] [See __boost_mpi.]]
 | |
| ]
 | |
| [heading Models]
 | |
| * `mpi_state<InnerState>`
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [section OpenMP Split State]
 | |
| As used by `openmp_nested_algebra`, essentially a Random Access Container with
 | |
| `ValueType = InnerState`.
 | |
| [heading Notation]
 | |
| [variablelist
 | |
|     [[`InnerState`] [The inner state type]]
 | |
|     [[`State`] [The split state type]]
 | |
|     [[`state`] [Object of type `State`]]
 | |
| ]
 | |
| [heading Valid Expressions]
 | |
| [table
 | |
|     [[Name] [Expression] [Type] [Semantics]]
 | |
|     [[Construct a state for `n` chunks]
 | |
|      [`State(n)`] [`State`] [Constructs underlying `vector`.]]
 | |
|     [[Get a chunk]
 | |
|      [`state[i]`] [`InnerState`] [Accesses underlying `vector`.]]
 | |
|     [[Get the number of chunks]
 | |
|      [`state.size()`] [`size_type`] [Returns size of underlying `vector`.]]
 | |
| ]
 | |
| [heading Models]
 | |
| * `openmp_state<ValueType>` with `InnerState = vector<ValueType>`
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [section Splitter]
 | |
| [heading Notation]
 | |
| [variablelist
 | |
|     [[`Container1`] [The continuous-data container type]]
 | |
|     [[`x`] [Object of type `Container1`]]
 | |
|     [[`Container2`] [The chunked-data container type]]
 | |
|     [[`y`] [Object of type `Container2`]]
 | |
| ]
 | |
| [heading Valid Expressions]
 | |
| [table
 | |
|     [[Name] [Expression] [Type] [Semantics]]
 | |
|     [[Copy chunks of input to output elements]
 | |
|      [`split(x, y)`] [`void`]
 | |
|      [Calls `split_impl<Container1, Container2>::split(x, y)`, splits `x` into
 | |
|      `y.size()` chunks.]]
 | |
|     [[Join chunks of input elements to output]
 | |
|      [`unsplit(y, x)`] [`void`]
 | |
|      [Calls `unsplit_impl<Container2, Container1>::unsplit(y, x)`, assumes `x`
 | |
|       is of the correct size ['__sigma `y[i].size()`], does not resize `x`.]]
 | |
| ]
 | |
| [heading Models]
 | |
| * defined for `Container1` = __boost_range and `Container2 = openmp_state`
 | |
| * and `Container2 = mpi_state`.
 | |
| 
 | |
| To implement splitters for containers incompatible with __boost_range,
 | |
| specialize the `split_impl` and `unsplit_impl` types:
 | |
| ```
 | |
| template< class Container1, class Container2 , class Enabler = void >
 | |
| struct split_impl {
 | |
|     static void split( const Container1 &from , Container2 &to );
 | |
| };
 | |
| 
 | |
| template< class Container2, class Container1 , class Enabler = void >
 | |
| struct unsplit_impl {
 | |
|     static void unsplit( const Container2 &from , Container1 &to );
 | |
| };
 | |
| ```
 | |
| [endsect]
 | |
| 
 | |
| [endsect]
 | |
| 
 | |
| [endsect]
 |