- Add popcnt and bmi
- [API] Add bitwise-shift batch constant api
- Refactor x86 CPU features
- [NEON] Unsigned bitwise shifts are never called
- Improve coverage of emulated architectures
- Introduce count{l,r}_{zero,one} for batch_bool
- Fix emulated mask()
- [neon] Implement bitwise_rshift for 64 bit integers on arm32
- Fix fast_cast int64/uint64→double under -ffast-math
- Small complexity reduction
- Add make_batch_constant from std::array in C++20
- [ci] Use home-baked clang-format action
- Fix apple detection
- [ci] add GCC 10 with AVX-512 to test matrix
- Slighly less pessimistic detection of neon64
- Fix runtime detection of SVE
- [ci] Setup Windows arm64 runner
- iota batch constant and a few overloads
- [test] Improve testing logging and accuracy
- Fix default values for AVX and AVX512 OS state enabled flags
- Implement batch_bool::mask() for riscv
- [ci] Revert emscripten to 4.0.21
- Restore RISCV support
- Implement optimized movemasks for NEON
- Fix limit behavior of atan2 under -ffast-math
- Move to C++14
- New architecture: VMX with VSX extension
- [API] Add
xsimd::bitwise_[l|r]shift<N>(...)andxsimd::rot[l|r]<N>(...)- [API] Add
xsimd::widento widen a batch to a batch twice as big- [API] Add
xsimd::first()function to extract the first lane from a batch- [API] Reorder
xsimd::make_batch_constantandxsimd::make_batch_bool_constanttemplate arguments- Bump CMake requirement to 3.10
- Provide generic and specialize implementation of
xsimd::reduce_mul- Have
xsimd::max/minbehave asstd::max/minwhen one argument is NaN- Optimize batch_bool load/store from/to array of booleans
- Cleaner error when trying to instantiate a batch while no arch is supported
- Fix
XSIMD_INLINEfor compilers that don't have always_inline- Rename
xsimd::genericinxsimd::common- Fix
xsimd::log10implementation under-ffast-math, and add-fast-math-supportto generic math algorithm and tests- Bump xtl dependency requirement
- Provide a generic implementation of
swizzlewith constant mask- Enable xsimd with only emulated arch
- Rename
avx512vnni<vbmi>inavx512vnni<vbmi2>- [SSE2] Fix and improve
xsimd::swizzleon[u]int16- [AVX512x] Specialize
xsimd::insert,xsimd::incr_if,xsimd::decr_if- [AVX512F,AVX512VBMI] Sepcialize
xsimd::slide_leftandxsimd::slide_right- [AVX512F] Fix
batch_boolxor- [WASM] Fix neq for
batch_bool- [AVX/AVX2/AVX512/ARM32] Improve implementation of
xsimd::swizzle- [AVX512VBMI2] Speciliaze
xsimd::compressandxsimd::expand- [SSE/AVX/AVX512] Improve
xsimd::reduce_add- [SSSE3/AVX2] Fix
xsimd::rotate_leftimplementation for[u]int16and optimize the[u]int8implementation- [AVX2] Fix implementation of
xsimd::rotate_left- [AVX512] Disable faulty implementation of
xsimd::rotate_left- [ARM64] Improve implementation of comparison operator for 64 bit integers
- [AVX512BW] Optimize
xsimd::shift_leftandxsimd::shift_right- [AVX512F] Fix
batch_constwith 16b and 8b integers
- Added broadcast overload for bool
- Fixed kernel::store for booleans
- Explicitly verify dependency between architectures (like sse2 implies sse2)
- Use default arch alignment as default alignment for xsimd::aligned_allocator
- sse2 version of xsimd::swizzle on [u]int16_t
- avx implementation of transpose for [u]int[8|16]
- Implement [u]int8 and [u]int16 matrix transpose for 128 bit registers
- Fix minor warning
- Fix fma4 support
- Fix rotate_left and rotate_right behavior (it was swapped!)
- Fix compress implementation on RISC-V
- Improve RISC-V CI
- Fix clang-17 compilation on RISC-V
- Validate cmake integration
- Provide xsimd::transpose on 64 and 32 bits on most platforms
- Improve documentation
- Provide xsimd::batch_bool::count
- Fix interaction between xsimd::make_sized_batch_t and xsimd::batch<std::complex, ...>
- Fix vbmi, sve and rvv detection through xsimd::available_architectures
- Fix compilation on MS targets where
smallcan be defined.- Change default install directory for installed headers.
- Support mixed-complex implementations of xsimd::pow()
- Improve xsimd::pow implementation for complex numbers
- Fix uninitialized read in lgamma implementation
- Most xsimd functions are flagged as always_inline
- Fix some xsimd scalar version (abs, bitofsign, signbit, bitwise_cast, exp10)
- Move from batch_constant<batch<T, A>, Csts...> to batch_constant<T, A, Csts...>
- Move from batch_bool_constant<batch<T, A>, Csts...> to batch_bool_constant<T, A, Csts...>
- Provide an as_batch() method (resp. as_batch_bool) method for batch_constant (resp. batch_bool_constant)
- New architecture emulated<N> for batches of N bits emulated using scalar operations.
- Remove the version method from all architectures
- Support xsimd::avg and xsimd::avgr vector operation
- Model i8mm arm extension
- Fix dispatching mechanism
- Update readme with a section on adoption, and a section on the history of the project
- Fix/avx512vnni implementation
- Fix regression on XSIMD_NO_SUPPORTED_ARCHITECTURE
- Fix various problems with architecture version handling
- Specialize xsimd::compress for riscv
- Provide stubs for various avx512xx architectures
- Fix sincos implementation to cope with Emscripten
- Upgraded minimal version of cmake to remove deprecation warning
- Fixed constants::signmask for GCC when using ffast-math
- Add RISC-V Vector support
- Generic, simple implementation fox xsimd::compress
- Disable batch of bools, and suggest using batch_bool instead
- Add an option to skip installation
- Provide shuffle operations of floating point batches
- Provide a generic implementation of xsimd::swizzle with dynamic indices
- Implement rotl, rotr, rotate_left and rotate_right
- Let CMake figure out pkgconfig directories
- Add missing boolean operators in xsimd_api.hpp
- Initial Implementation for the new WASM based instruction set
- Provide a generic version for float to uint32_t conversion
- Introduce XSIMD_DEFAULT_ARCH to force default architecture (if any)
- Remove C++ requirement on xsimd::exp10 scalar implementation
- Improve and test documentation
- Provide a generic reducer
- Fix
find_package(xsimd)for xtl enabled xsimd, reloaded- Cleanup benchmark code
- Provide avx512f implementation of FMA and variant
- Hexadecimal floating points are not a C++11 feature
- back to slow implementation of exp10 on Windows
- Changed bitwise_cast API
- Provide generic signed /unsigned type conversion
- Fixed sde location
- Feature/incr decr
- Cleanup documentation
- Fix potential ABI issue in SVE support
- Disable fast exp10 on OSX
- Assert on unaligned memory when calling aligned load/store
- Fix warning about uninitialized storage
- Always forward arch parameter
- Do not specialize the behavior of
simd_return_typefor char- Support broadcasting of complex batches
- Make xsimd compatible with -fno-exceptions
- Provide and test comparison operators overloads that accept scalars
- Fix potential ABI issue in SVE support, making
xsimd::svea type alias to size-dependent type.
- Support fixed size SVE
- Fix a bug in SSSE3
xsimd::swizzleimplementation forint8andint16- Rename
xsimd::haddintoxsimd::reduce_add, providexsimd::reduce_minandxsimd::reduce_max- Properly report unsupported double for neon on arm32
- Fill holes in xsimd scalar api
- Fix
find_package(xsimd)for xtl enabled xsimd- Replace
xsimd::bool_castbyxsimd::batch_bool_cast- Native
xsimd::haddfor float on arm64- Properly static_assert when trying to instantiate an
xsimd::batchof xtl complex- Introduce
xsimd::batch_bool::mask()andbatch_bool::from_mask(...)- Flag some function with
[[nodiscard]]- Accept both relative and absolute libdir and include dir in xsimd.pc
- Implement
xsimd::nearbyint_as_intfor NEON- Add
xsimd::polar- Speedup double -> F32/I32 gathers
- Add
xsimd::slide_leftandxsimd::slide_right- Support integral
xsimd::swizzleson AVX
Add
xsimd::gatherandxsimd::scatterAdd
xsimd::nearbyint_as_intAdd
xsimd::noneAdd
xsimd::reciprocalRemove batch constructor from memory adress, use
xsimd::batch<...>::load_(un)alignedinsteadLeave to msvc users the opportunity to manually disable FMA3 on AVX
Provide
xsimd::insertto modify a single value from a vectorMake
xsimd::powimplementation resilient toFE_INVALIDReciprocal square root support through
xsimd::rsqrtNEON: Improve
xsimd::anyandxsimd::allProvide type utility to explicitly require a batch of given size and type
Implement
xsimd::swizzleon x86, neon and neon64Avx support for
xsimd::zip_loandxsimd::zip_hiOnly use
_mm256_unpacklo_epi<N>on AVX2Provide neon/neon64 conversion function from
uint(32|64)_tto(float|double)Provide SSE/AVX/AVX2 conversion function from
uint32_ttofloatProvide AVX2 conversion function from
(u)int64_ttodoubleProvide better SSE conversion function from
uint64_ttodoubleProvide better SSE conversion function to
doubleSupport logical xor for
xsimd::batch_boolClarify fma support:
- FMA3 + SSE ->
xsimd::fma3<sse4_2>- FMA3 + AVX ->
xsimd::fma3<avx>- FMA3 + AVX2 ->
xsimd::fma3<avx2>- FMA4 ->
xsimd::fma4Allow
xsimd::transformto work with complex typesAdd missing scalar version of
xsimd::normandxsimd::conj
- Fix neon
xsimd::haddimplementation- Detect unsupported architectures and set
XSIMD_NO_SUPPORTED_ARCHITECTUREif needs be
- Provide some conversion operators for
float->uint32- Improve code generated for AVX2 signed integer comparisons
- Enable detection of avx512cd and avx512dq, and fix avx512bw detection
- Enable detection of AVX2+FMA
- Pick the best compatible architecture in
xsimd::dispatch- Enables support for FMA when AVX2 is detected on Windows
- Add missing includes / forward declaration
- Mark all functions inline and noexcept
- Assert when using incomplete
std::initializer_list
- Improve CI & testing, no functional change
- Do not use
_mm256_srai_epi32under AVX, it's an AVX2 instruction
- Fix invalid constexpr
std::make_tupleusage in neon64