Programming

x86 SIMD 내장 함수의 헤더 파일

procodes 2020. 7. 13. 22:00
반응형

x86 SIMD 내장 함수의 헤더 파일


어떤 x86 SIMD 명령어 세트 확장 (MMX, SSE, AVX, ...)에 내장 함수를 제공하는 헤더 파일은 무엇입니까? 온라인에서 그러한 목록을 찾는 것은 불가능합니다. 틀 렸으면 말해줘.


<mmintrin.h>  MMX

<xmmintrin.h> SSE

<emmintrin.h> SSE2

<pmmintrin.h> SSE3

<tmmintrin.h> SSSE3

<smmintrin.h> SSE4.1

<nmmintrin.h> SSE4.2

<ammintrin.h> SSE4A

<wmmintrin.h> AES

<immintrin.h> AVX

<zmmintrin.h> AVX512

그냥 사용하면

#include <x86intrin.h>

여기에는 컴파일러 스위치에 따라 -march=corei7또는 처럼 활성화 된 모든 SSE / AVX 헤더가 포함됩니다 -march=native. 또한 일부 x86 특정 명령어 는 내장 함수 와 같 bswap거나 ror내장 함수로 제공됩니다.


헤더 이름은 컴파일러와 대상 아키텍처에 따라 다릅니다.

  • Windows 용 Microsoft C ++ (타겟팅 x86, x86-64 또는 ARM) 및 Intel C / C ++ 컴파일러 intrin.h
  • x86 / x86-64를 타겟팅하는 gcc / clang / icc의 경우 x86intrin.h
  • NEON을 사용하여 ARM을 대상으로하는 gcc / clang / armcc의 경우 arm_neon.h
  • WMMX를 사용하여 ARM을 대상으로하는 gcc / clang / armcc의 경우 mmintrin.h
  • VMX (일명 Altivec) 및 / 또는 VSX를 사용하는 PowerPC를 대상으로하는 gcc / clang / xlcc의 경우 altivec.h
  • SPE를 사용하는 PowerPC를 대상으로하는 gcc / clang의 경우 spe.h

조건부 전처리 지시문을 사용하여 이러한 모든 경우를 처리 할 수 ​​있습니다.

#if defined(_MSC_VER)
     /* Microsoft C/C++-compatible compiler */
     #include <intrin.h>
#elif defined(__GNUC__) && (defined(__x86_64__) || defined(__i386__))
     /* GCC-compatible compiler, targeting x86/x86-64 */
     #include <x86intrin.h>
#elif defined(__GNUC__) && defined(__ARM_NEON__)
     /* GCC-compatible compiler, targeting ARM with NEON */
     #include <arm_neon.h>
#elif defined(__GNUC__) && defined(__IWMMXT__)
     /* GCC-compatible compiler, targeting ARM with WMMX */
     #include <mmintrin.h>
#elif (defined(__GNUC__) || defined(__xlC__)) && (defined(__VEC__) || defined(__ALTIVEC__))
     /* XLC or GCC-compatible compiler, targeting PowerPC with VMX/VSX */
     #include <altivec.h>
#elif defined(__GNUC__) && defined(__SPE__)
     /* GCC-compatible compiler, targeting PowerPC with SPE */
     #include <spe.h>
#endif

페이지에서

+----------------+------------------------------------------------------------------------------------------+
|     Header     |                                         Purpose                                          |
+----------------+------------------------------------------------------------------------------------------+
| x86intrin.h    | Everything, including non-vector x86 instructions like _rdtsc().                         |
| mmintrin.h     | MMX (Pentium MMX!)                                                                       |
| mm3dnow.h      | 3dnow! (K6-2) (deprecated)                                                               |
| xmmintrin.h    | SSE + MMX (Pentium 3, Athlon XP)                                                         |
| emmintrin.h    | SSE2 + SSE + MMX (Pentium 4, Athlon 64)                                                  |
| pmmintrin.h    | SSE3 + SSE2 + SSE + MMX (Pentium 4 Prescott, Athlon 64 San Diego)                        |
| tmmintrin.h    | SSSE3 + SSE3 + SSE2 + SSE + MMX (Core 2, Bulldozer)                                      |
| popcntintrin.h | POPCNT (Nehalem (Core i7), Phenom)                                                       |
| ammintrin.h    | SSE4A + SSE3 + SSE2 + SSE + MMX (AMD-only, starting with Phenom)                         |
| smmintrin.h    | SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Penryn, Bulldozer)                             |
| nmmintrin.h    | SSE4_2 + SSE4_1 + SSSE3 + SSE3 + SSE2 + SSE + MMX (Nehalem (aka Core i7), Bulldozer)     |
| wmmintrin.h    | AES (Core i7 Westmere, Bulldozer)                                                        |
| immintrin.h    | AVX, AVX2, AVX512, all SSE+MMX (except SSE4A and XOP), popcnt, BMI/BMI2, FMA             |
+----------------+------------------------------------------------------------------------------------------+

그래서 일반적으로 당신 만 포함 할 수 있습니다 immintrin.h모든 인텔 확장을 얻기 위해, 또는 x86intrin.h당신은 모든 것을 포함하려는 경우 _bit_scan_forward_rdtsc,뿐만 아니라 모든 벡터 내장 함수를 AMD-유일한 사람을 포함한다. 실제로 필요한 것 이상을 포함하지 않으면 테이블을보고 올바른 포함을 선택할 수 있습니다.

x86intrin.h자체 헤더를 사용하지 않고 AMD XOP (Bulldozer 전용, 향후 AMD CPU는 아님)에 대한 내장 기능을 얻는 권장 방법 입니다.

Some compilers will still generate error messages if you use intrinsics for instruction-sets you haven't enabled (e.g. _mm_fmadd_ps without enabling fma, even if you include immintrin.h and enable AVX2).


As many of the answers and comments have stated, <x86intrin.h> is the comprehensive header for x86[-64] SIMD intrinsics. It also provides intrinsics supporting instructions for other ISA extensions. gcc, clang, and icc have all settled on this. I needed to do some digging on versions that support the header, and thought it might be useful to list some findings...

  • gcc : support for x86intrin.h first appears in gcc-4.5.0. The gcc-4 release series is no longer being maintained, while gcc-6.x is the current stable release series. gcc-5 also introduced the __has_include extension present in all clang-3.x releases. gcc-7 is in pre-release (regression testing, etc.) and following the current versioning scheme, will be released as gcc-7.1.0.

  • clang : x86intrin.h appears to have been supported for all clang-3.x releases. The latest stable release is clang (LLVM) 3.9.1. The development branch is clang (LLVM) 5.0.0. It's not clear what's happened to the 4.x series.

  • Apple clang : annoyingly, Apple's versioning doesn't correspond with that of the LLVM projects. That said, the current release: clang-800.0.42.1, is based on LLVM 3.9.0. The first LLVM 3.0 based version appears to be Apple clang 2.1 back in Xcode 4.1. LLVM 3.1 first appears with Apple clang 3.1 (a numeric coincidence) in Xcode 4.3.3.

    Apple also defines __apple_build_version__ e.g., 8000042. This seems about the most stable, strictly ascending versioning scheme available. If you don't want to support legacy compilers, make one of these values a minimum requirement.

Any recent version of clang, including Apple versions, should therefore have no issue with x86intrin.h. Of course, along with gcc-5, you can always use the following:

#if defined (__has_include) && (__has_include(<x86intrin.h>))
#include <x86intrin.h>
#else
#error "upgrade your compiler. it's free..."
#endif

One trick you can't really rely on is using the __GNUC__ versions in clang. The versioning is, for historical reasons, stuck at 4.2.1. A version that precedes the x86intrin.h header. It's occasionally useful for, say, simple GNU C extensions that have remained backwards compatible.

  • icc : as far as I can tell, the x86intrin.h header is supported since at least Intel C++ 16.0. The version test can by performed with: #if (__INTEL_COMPILER >= 1600). This version (and possibly earlier versions) also provides support for the __has_include extension.

  • MSVC : It appears that MSVC++ 12.0 (Visual Studio 2013) is the first version to provide the intrin.h header - not x86intrin.h... this suggests: #if (_MSC_VER >= 1800) as a version test. Of course, if you're trying to write code that's portable across all these different compilers, the header name on this platform will be the least of your problems.

참고URL : https://stackoverflow.com/questions/11228855/header-files-for-x86-simd-intrinsics

반응형