Abstract: Energy efficiency is of major concern in HPC.DSP architectures have the potential to offer highly competitiveenergy efficiency for applications requiring 64-bit floatingpointprecision. For STREAM, we achieved 1.47GB/J energy efficiency and 96% DDR3 memory bandwidth utilization on the Texas Instruments TMS320C6678 DSP by using its DMAengines for prefetching to avoid cache misses, which cause pipeline stalls in the DSP’s cores, and to prevent write-allocate loads, which would significantly reduce performance. The DMA engines were also used to c...
(read more)
Topics: 
Embedded system
Computer hardware