PAPI (Performance Application Programmer’s Interface) is designed to efficiently access the performance hardware counters on modern computer processors. PAPI is being developed at the University of Tennessee’s Innovative Computing Laboratory in the Computer Science Department. (Source:

References in zbMATH (referenced in 28 articles )

Showing results 1 to 20 of 28.
Sorted by year (citations)

1 2 next

  1. Iwen, M.A.; Ong, B.W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
  2. Cebrian, Juan M.; Jahre, Magnus; Natvig, Lasse: ParVec: vectorizing the PARSEC benchmark suite (2015)
  3. Cebrián, Juan M.; Natvig, Lasse; Meyer, Jan Christian: Performance and energy impact of parallelization and vectorization techniques in modern microprocessors (2014) ioport
  4. Ding, Chen; Xiang, Xiaoya; Bao, Bin; Luo, Hao; Luo, Ying-Wei; Wang, Xiao-Lin: Performance metrics and models for shared cache (2014) ioport
  5. Zhang, Wei; Wei, Wenjie; Cai, Xing: Performance modeling of serial and parallel implementations of the fractional Adams-Bashforth-Moulton method (2014)
  6. Bock, Nicolas; Challacombe, Matt: An optimized sparse approximate matrix multiply for matrices with decay (2013)
  7. Buttari, Alfredo: Fine-grained multithreading for the multifrontal $QR$ factorization of sparse matrices (2013)
  8. Gai, Jiading; Obeid, Nady; Holtrop, Joseph L.; Wu, Xiao-Long; Lam, Fan; Fu, Maojing; Haldar, Justin P.; Hwu, Wen-mei W.; Liang, Zhi-Pei; Sutton, Bradley P.: More IMPATIENT: a gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on gpus (2013) ioport
  9. Gracioli, Giovani; Fröhlich, Ant^onio Augusto; Pellizzoni, Rodolfo; Fischmeister, Sebastian: Implementation and evaluation of global and partitioned scheduling in a real-time OS (2013)
  10. Russell, Francis P.; Kelly, Paul H.J.: Optimized code generation for finite element local assembly using symbolic manipulation (2013)
  11. Rutar, Nick; Hollingsworth, Jeffrey K.: Data centric techniques for mapping performance data to program variables (2012) ioport
  12. Yaseen, Ashraf; Li, Yaohang: Accelerating knowledge-based energy evaluation in protein structure modeling with graphics processing units (2012) ioport
  13. Kalinnik, Natalia; Korch, Matthias; Rauber, Thomas: An efficient time-step-based self-adaptive algorithm for predictor-corrector methods of Runge-Kutta type (2011)
  14. Askitis, Nikolas; Sinha, Ranjan: Engineering scalable, cache and space efficient tries for strings (2010) ioport
  15. Hejazialhosseini, Babak; Rossinelli, Diego; Bergdorf, Michael; Koumoutsakos, Petros: High order finite volume methods on wavelet-adapted grids with local time-stepping on multicore architectures for the simulation of shock-bubble interactions (2010)
  16. Hölldobler, Steffen; Manthey, Norbert; Saptawijaya, Ari: Improving resource-unaware SAT solvers (2010)
  17. Dooley, Isaac; Mangala, Sandhya; Kale, Laxmikant; Geubelle, Philippe: Parallel simulations of dynamic fracture using extrinsic cohesive elements (2009)
  18. Fürlinger, Karl; Moore, Shirley: Capturing and analyzing the execution control flow of OpenMP applications (2009)
  19. Mahajan, Reema; Kranzlmüller, Dieter; Volkert, Jens; Hansmann, Ulrich H.E.; Höfinger, Siegfried: Detecting secondary bottlenecks in parallel quantum chemistry applications using MPI (2008)
  20. Grigori, Laura; Li, Xiaoye S.: Towards an accurate performance modeling of parallel sparse factorization (2007)

1 2 next

Further publications can be found at: