A 2589 line topology optimization code written for the graphics card. We investigate topology optimization based on the solid isotropic material with penalization approach on compute unified device architecture enabled graphics cards in three dimensions. Linear elasticity is solved entirely on the GPU by a matrix-free conjugate gradient method using finite elements. Due to the unique requirements of the single instruction, multiple data stream processors, special attention is given to the procedural generation of matrix-vector products entirely on the graphics card. The GPU code is found to be extremely efficient, being faster than a 48 core shared memory CPU system. CPU and GPU implementations show different performance bottlenecks. The sources are available at url{ schmidt/gputop}.