SHORT Double Precision MFLOP/s

EVENTSET
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
PMC0  UOPS_RETIRED_SCALAR_SIMD
PMC1  UOPS_RETIRED_PACKED_SIMD

METRICS
Runtime (RDTSC) [s] time
Runtime unhalted [s] FIXC1*inverseClock
Clock [MHz]  1.E-06*(FIXC1/FIXC2)/inverseClock
CPI  FIXC1/FIXC0
DP [MFLOP/s] (SSE assumed) 1.0E-06*((PMC1*2.0)+PMC0)/time
DP [MFLOP/s] (AVX assumed) 1.0E-06*((PMC1*4.0)+PMC0)/time
DP [MFLOP/s] (AVX512 assumed) 1.0E-06*((PMC1*8.0)+PMC0)/time
Packed [MUOPS/s]   1.0E-06*(PMC1)/time
Scalar [MUOPS/s] 1.0E-06*PMC0/time

LONG
Formulas:
DP [MFLOP/s] (SSE assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*2+UOPS_RETIRED_SCALAR_SIMD)/runtime
DP [MFLOP/s] (AVX assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*4+UOPS_RETIRED_SCALAR_SIMD)/runtime
DP [MFLOP/s] (AVX512 assumed) = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD*8+UOPS_RETIRED_SCALAR_SIMD)/runtime
Packed [MUOPS/s] = 1.0E-06*(UOPS_RETIRED_PACKED_SIMD)/runtime
Scalar [MUOPS/s] = 1.0E-06*UOPS_RETIRED_SCALAR_SIMD/runtime
-
AVX/SSE scalar and packed double precision FLOP rates. The Xeon Phi (Knights Landing) provides
no possibility to differentiate between double and single precision FLOP/s. Therefore, we only
assume that the printed [MFLOP/s] value is for double-precision code. Moreover, there is no way
to distinguish between SSE, AVX or AVX512 packed SIMD operations. Therefore, this group prints
out the [MFLOP/s] for different SIMD techniques.
WARNING: The events also count for integer arithmetics
