simd - Can multiple processes hide latency of SSE instructions? -


i'm in need of high-performance merging , came accross: efficient implementation of sorting on multi-core simd cpu architecture jatin chhugani et al.

their aim performance out of 1 cpu, 1 part of solution use bitonic sorting network on simd level. hide latency of min/max , shuffle operations perform 4 sorting networks simultaneously (though think meant interleaved.). gives claimed 3.25x increase of performance.

my problem relaxed, have multiple pairs of arrays need processed (read independent) can run multiple processes , gain higher throughput.

though if oversubscribe amount of processes available cores, hide latency well? induced on higher level? or treading here in realm of hyperthreading , i'll never pass limit of 2 processes sharing same functional units in cpu-core?

i of course try, changing existing code rather involved , i'd hear theories first.

no, threading not effective solution pipeline bubbles. granularity doesn't fit: context switching takes hundreds of cycles, whereas sort of stall caused naive implementation of bitonic sorting in 2-4 cycle pieces.

with said, it's not clear use-case is, or bottleneck turn out be, multiprocessing help. 1 way find out.


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -