Paper accepted at IEEE INFOCOM 2020
Models of parallel processing systems typically assume that one has l servers and jobs are split into an equal number of k=l tasks. This seemingly simple approximation has surprisingly large consequences for the resulting stability and performance bounds. In reality, best practices for modern map-reduce systems indicate that a job's partitioning factor should be much larger than the number of servers available, with some researchers going to far as to advocate for a ``tiny tasks'' regime, where jobs are split into over 10,000 tasks. In this paper we use recent advances in stochastic network calculus to fundamentally understand the effects of task granularity on parallel systems' scaling, stability, and performance. For the split-merge model, we show that when one allows for tiny tasks, the stability region is actually much better than had previously been concluded. For the single-queue fork-join model, we show that sojourn times quickly approach the optimal case when l ``big tasks'' are sub-divided into k>>l ``tiny tasks''. Our results are validated using extensive simulations, and the applicability of the models used is validated by experiments on an Apache Spark cluster.