SIMD/GPU vectorization
Many of the Runko algorithms are vectorized to run faster with modern CPU architectures with SIMD support and with GPUs that require multi-core execution.
Vectorization control flags
In some platforms these vectorizations can become a performance bottleneck; typical reasons are either launching of too many micro-kernels (that would be more efficiently calculated by grouping them) or memory-write-blocking because of atomic write operations (that would be performed faster by just using serial operations).
In these cases, we can revert back to non-vectorized operations by un-defining special compile time flags. These are:
VEC_FLD2D use vectorization for 2D electromagnetic tile mesh copy operations.
VEC_FLD3D use vectorization for 3D electromagnetic tile mesh copy operations.
VEC_CUR2D use vectorization for 2D electromagnetic tile mesh addition operations.
VEC_CUR3D use vectorization for 3D electromagnetic tile mesh addition operations.
Typical cause for bad performance are the last two flags that require atomic additions.
Note
TODO: expand this section.