In my program, I merge potentially many RDDs into one iteratively with repeated calls of sRDD = sRDD.union(tmpRDD).cache()
. All RDDs get the same partitioner so that no re-shuffling occurs. This works well in the beginning of the execution, but at a certain point the execution is blocked, the log show "Asked to send map output locations for shuffle", and most executors are either idle or have only one task assigned. I tried several solution approaches, such as setting the number of CPUs for one executor to 1, changing the memory settings of executors (I assumed that they request more memory than they can get), but nothing solved the issue. I think there is a limit concerning the number of RDDs that can be merged with union() safely. Less than 10 is no problem; more than 50 triggers the error always. I'm changing my program now to perform less unions.