SIMD/GPU vectorization ====================== Many of the Runko algorithms are vectorized to run faster with modern CPU architectures with SIMD support and with GPUs that require multi-core execution. Vectorization control flags --------------------------- In some platforms these vectorizations can become a performance bottleneck; typical reasons are either launching of too many micro-kernels (that would be more efficiently calculated by grouping them) or memory-write-blocking because of atomic write operations (that would be performed faster by just using serial operations). In these cases, we can revert back to non-vectorized operations by un-defining special compile time flags. These are: - `VEC_FLD2D` use vectorization for 2D electromagnetic tile mesh copy operations. - `VEC_FLD3D` use vectorization for 3D electromagnetic tile mesh copy operations. - `VEC_CUR2D` use vectorization for 2D electromagnetic tile mesh addition operations. - `VEC_CUR3D` use vectorization for 3D electromagnetic tile mesh addition operations. Typical cause for bad performance are the last two flags that require atomic additions. .. note:: TODO: expand this section.