ProGen: Provenance database generator for large-scale data set. It is crucially important for researchers especially scientists to judge the correctness and timeliness of data and experiments according to provenance. Regarding the technologies about view materialization and data annotation, provenance has emerged to be a new research topic. Appropriate provenance data set is the foundation for verifying the accuracy and functionality of new techniques and/or algorithms on provenance management, meanwhile, the synthetic provenance data set is also of importance for verification and improvement of algorithms before gleaning the real provenance data to some expected extent. In this paper, one novel provenance database generator, ProGen was proposed, which was able to generate a provenance database, according to the input data schema and provenance annotation, with the specific data volume. The evaluation indicates that our design and implementation is efficient and scalable