The SPLASH-2 programs: characterization and methodological considerations. The SPLASH-2 suite of parallel applications has recently been released to facilitate the study of centralized and distributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.

References in zbMATH (referenced in 41 articles )

Showing results 1 to 20 of 41.
Sorted by year (citations)

1 2 3 next

  1. Wardi, Y.; Seatzu, C.; Chen, X.; Yalamanchili, S.: Performance regulation of event-driven dynamical systems using infinitesimal perturbation analysis (2016)
  2. Sánchez, Daniel; Cebrián, Juan M.; García, José M.; Aragón, Juan L.: Soft-error mitigation by means of decoupled transactional memory threads (2015) ioport
  3. Wang, Hui; Wang, Rui; Luan, Zhongzhi; Qian, Xuehai; Qian, Depei: Improving multiprocessor performance with fine-grain coherence bypass (2015) ioport
  4. Liu, Peng; Fang, Lei; Huang, Michael C.: DEAM: decoupled, expressive, area-efficient metadata cache (2014) ioport
  5. Munir, Arslan; Gordon-Ross, Ann; Ranka, Sanjay; Koushanfar, Farinaz: A queueing theoretic approach for performance evaluation of low-power multi-core embedded systems (2014)
  6. Mushtaq, Hamid; Al-Ars, Zaid; Bertels, Koen: Efficent and highly portable deterministic multithreading (DetLock) (2014) ioport
  7. Pankratius, Victor; Adl-Tabatabai, Ali-Reza: Software engineering with transactional memory versus locks in practice (2014) ioport
  8. Abellán, José L.; Fernández, Juan; Acacio, Manuel E.: Design of an efficient communication infrastructure for highly contended locks in many-core CMPs (2013) ioport
  9. Oz, Isil; Topcuoglu, Haluk Rahmi; Kandemir, Mahmut; Tosun, Oguz: Thread vulnerability in parallel applications (2012) ioport
  10. Sahelices, Benjamín; De Dios, Agustín; Ibáñez, Pablo; Viñals-Yúfera, Víctor; Llabería, José María: Effcient handling of lock hand-off in DSM multiprocessors with buffering coherence controllers (2012) ioport
  11. Zhou, Xu; Lu, Kai; Wang, Xiaoping; Li, Xu: Exploiting parallelism in deterministic shared memory multiprocessing (2012) ioport
  12. Chiu, Yung-Chang; Shieh, Ce-Kuen; Huang, Tzu-Chi; Liang, Tyng-Yeu; Chu, Kuo-Chih: Data race avoidance and replay scheme for developing and debugging parallel programs on distributed shared memory systems (2011)
  13. Fensch, Christian; Cintra, Marcelo: An evaluation of an OS-based coherence scheme for tiled CMPs (2011) ioport
  14. Hammoud, Mohammad; Cho, Sangyeun; Melhem, Rami: C-AMTE: A location mechanism for flexible cache management in chip multiprocessors (2011) ioport
  15. Hoffmann, Ralf; Rauber, Thomas: Adaptive task pools: Efficiently balancing large number of tasks on shared-address spaces (2011) ioport
  16. Kim, Hyunhee; Kim, Jihong: A leakage-aware L2 cache management technique for producer-consumer sharing in low-power chip multiprocessors (2011) ioport
  17. Rutzig, Mateus B.; Beck, Antonio C.S.; Madruga, Felipe; Alves, Marco A.; Freitas, Henrique C.; Maillard, Nicolas; Navaux, Philippe O.A.; Carro, Luigi: Boosting parallel applications performance on applying DIM technique in a multiprocessing environment (2011) ioport
  18. Akram, Shoaib; Papakonstantinou, Alexandros; Kumar, Rakesh; Chen, Deming: A workload-adaptive and reconfigurable bus architecture for multicore processors (2010) ioport
  19. Guironnet de Massas, Pierre; Pétrot, Frédéric: Evaluation of the implementation cost of cache coherence protocols using omniscient actions (2010) ioport
  20. Harmanci, Derin; Gramoli, Vincent; Felber, Pascal; Fetzer, Christof: Extensible transactional memory testbed (2010)

1 2 3 next