A Runtime Library for Platform-Independent Task Parallelism. With the increasing diversity of computing systems and the rapid performance improvement of commodity hardware, heterogeneous clusters become the dominant platform for low-cost, high-performance computing. Grid-enabled and heterogeneous implementations of MPI establish it as the de facto programming model for these environments. On the other hand, task parallelism provides a natural way for exploiting their hierarchical architecture. This hierarchy has been further extended with the advent of general-purpose GPU devices. In this paper we present the implementation of an MPI-based task library for heterogeneous and GPU clusters. The library offers an intuitive programming interface for multilevel task parallelism with transparent data management and load balancing. We discuss design and implementation issues regarding heterogeneity support and report performance results on heterogeneous cluster computing environments.