PAPI (Performance Application Programmer’s Interface) is designed to efficiently access the performance hardware counters on modern computer processors. PAPI is being developed at the University of Tennessee’s Innovative Computing Laboratory in the Computer Science Department. (Source:

References in zbMATH (referenced in 39 articles )

Showing results 1 to 20 of 39.
Sorted by year (citations)

1 2 next

  1. Ramachandran, Prabhu; Bhosale, Aditya; Puri, Kunal; Negi, Pawan; Muta, Abhinav; Dinesh, A.; Menon, Dileep; Govind, Rahul; Sanka, Suraj; Sebastian, Amal S.; Sen, Ananyo; Kaushik, Rohan; Kumar, Anshuman; Kurapati, Vikas; Patil, Mrinalgouda; Tavker, Deep; Pandey, Pankaj; Kaushik, Chandrashekhar; Dutt, Arkopal; Agarwal, Arpit: PySPH: a Python-based framework for smoothed particle hydrodynamics (2021)
  2. Benitez, Domingo; Escobar, J. M.; Montenegro, R.; Rodriguez, E.: Performance comparison and workload analysis of mesh untangling and smoothing algorithms (2019)
  3. Bremer, Maximilian; Kazhyken, Kazbek; Kaiser, Hartmut; Michoski, Craig; Dawson, Clint: Performance comparison of HPX versus traditional parallelization strategies for the discontinuous Galerkin method (2019)
  4. Gadioli, Davide; Vitali, Emanuele; Palermo, Gianluca; Silvano, Cristina: mARGOt: a dynamic autotuning framework for self-aware approximate computing (2019)
  5. Burstedde, Carsten; Fonseca, Jose A.; Kollet, Stefan: Enhancing speed and scalability of the ParFlow simulation code (2018)
  6. Chen, Xinwei; Wardi, Yorai; Yalamanchili, Sudhakar: Instruction-throughput regulation in computer processors with data-center applications (2018)
  7. Cebrián, Juan M.; Cecilia, José M.; Hernández, Mario; García, José M.: Code modernization strategies to 3-D stencil-based applications on intel Xeon Phi: KNC and KNL (2017)
  8. Hoske, Daniel; Lukarski, Dimitar; Meyerhenke, Henning; Wegner, Michael: Engineering a combinatorial Laplacian solver: lessons learned (2016)
  9. Iwen, M. A.; Ong, B. W.: A distributed and incremental SVD algorithm for agglomerative data analysis on large networks (2016)
  10. Li, Shengguo; Liao, Xiangke; Liu, Jie; Jiang, Hao: New fast divide-and-conquer algorithms for the symmetric tridiagonal eigenvalue problem. (2016)
  11. Cebrian, Juan M.; Jahre, Magnus; Natvig, Lasse: ParVec: vectorizing the PARSEC benchmark suite (2015)
  12. Cebrián, Juan M.; Natvig, Lasse; Meyer, Jan Christian: Performance and energy impact of parallelization and vectorization techniques in modern microprocessors (2014) ioport
  13. de la Cruz, Raúl; Araya-Polo, Mauricio: Algorithm 942: Semi-stencil (2014)
  14. Ding, Chen; Xiang, Xiaoya; Bao, Bin; Luo, Hao; Luo, Ying-Wei; Wang, Xiao-Lin: Performance metrics and models for shared cache (2014) ioport
  15. Zhang, Wei; Wei, Wenjie; Cai, Xing: Performance modeling of serial and parallel implementations of the fractional Adams-Bashforth-Moulton method (2014)
  16. Bock, Nicolas; Challacombe, Matt: An optimized sparse approximate matrix multiply for matrices with decay (2013)
  17. Buttari, Alfredo: Fine-grained multithreading for the multifrontal (QR) factorization of sparse matrices (2013)
  18. Gai, Jiading; Obeid, Nady; Holtrop, Joseph L.; Wu, Xiao-Long; Lam, Fan; Fu, Maojing; Haldar, Justin P.; Hwu, Wen-mei W.; Liang, Zhi-Pei; Sutton, Bradley P.: More IMPATIENT: a gridding-accelerated Toeplitz-based strategy for non-Cartesian high-resolution 3D MRI on gpus (2013) ioport
  19. Gracioli, Giovani; Fröhlich, Antônio Augusto; Pellizzoni, Rodolfo; Fischmeister, Sebastian: Implementation and evaluation of global and partitioned scheduling in a real-time OS (2013)
  20. Russell, Francis P.; Kelly, Paul H. J.: Optimized code generation for finite element local assembly using symbolic manipulation (2013)

1 2 next

Further publications can be found at: