Inastemp: A novel intrinsics-as-template library for portable SIMD-vectorization. The development of scientific applications requires highly optimized computational kernels to benefit from modern hardware. In recent years, vectorization has gained key importance in exploiting the processing capabilities of modern CPUs, whose evolution is characterized by increasing register-widths and core numbers, but stagnating clock frequencies. In particular, vectorization allows floating point operations to be performed at a higher rate than the processor’s frequency. However, compilers often fail to vectorize complex codes and pure assembly/intrinsic implementations often suffer from software engineering issues, such as readability and maintainability. Moreover, it is difficult for domain scientists to write optimized code without technical support. To address these issues, we propose Inastemp, a lightweight open-source C++ library. Inastemp offers a solution to develop hardware-independent computational kernels for the CPU. These kernels are portable across compilers and floating point precision and vectorized targeting SSE(3,4.1,4.2), AVX(2), AVX512, or ALTIVEC/VMX instructions. Inastemp provides advanced features, such as an if-else statement that vectorizes branches that cannot be removed. Our performance study shows that Inastemp has the same efficiency as pure intrinsic approaches on modern architectures. As side-results, this study provides micro benchmarks on the latest HPC architectures for three different computational kernels, emphasizing comparisons between scalar and intrinsic-based codes

