python (12.9k questions)
javascript (9.2k questions)
reactjs (4.7k questions)
java (4.2k questions)
java (4.2k questions)
c# (3.5k questions)
c# (3.5k questions)
html (3.3k questions)
ARM NEON: Convert a binary 8-bit-per-pixel image (only 0/1) to 1-bit-per-pixel?
I am working on a task to convert a large binary label image, which has 8 bits (uint8_t) per pixel and each pixel can only be 0 or 1 (or 255), to an array of uint64_t numbers and each bit in uint64_t ...
debug_all_the_time
Votes: 0
Answers: 3
How to load vector registers from integer registers in Arm64? (M1)
This is a question about SIMD instructions on AArch64 on an M1.
I am working on a routine that works entirely inside the registers. All the memory reads and writes occur outside of the main loop. The ...

JON-ERIK STORM
Votes: 0
Answers: 1
Efficient C vectors for generic SIMD (SSE, AVX, NEON) test for zero matches. (find FP max absolute value and index)
I want to see if it's possible to write some generic SIMD code that can compile efficiently. Mostly for SSE, AVX, and NEON. A simplified version of the problem is: Find the maximum absolute value of...
TrentP
Votes: 0
Answers: 3
What is the most efficient way to handle integer multiplication overflow with saturation with ARM Neon intrinsics?
I have the following multiplication between 2 16 bit vectors:
int16x8_t dx;
int16x8_t dy;
int16x8_t dxdy = vmulq_s16(dx, dy);
In case dx and dy are both large enough, the result will overflow.
I woul...

Elad Maimoni
Votes: 0
Answers: 1