When conducting runtime experiments with a Spark cluster in standalone mode (no YARN or Mesos) I stumbled upon the options for executors. By default, only one executor was spawned for each worker underutilizing the available memory, but fully utilizing all CPU cores. I intended to run several executors with less cores. The reason is that the default number of cores of one executor in standalone mode is the total number of cores of one worker, which was not clear to me from the documentation (and is supposed to be different for YARN). The following settings were relevant in my case:
Run the driver on the master node. That's convenient, because otherwise the driver gets deployed on a randomly chosen worker, which makes debugging harder.
The total number of executors that I plan to run in parallel. This is a bit fuzzy in the documentation. If more executors fit into the workers, Spark will spawn more executors. So this setting seems to be irrelevant in standalone mode.
The amount of memory one executor has. This should obviously fit into the main memory of one worker. If it's set too large, less executors than num-executors are spawned.
The number of CPU cores per executor. This has to fit into the number of CPU cores of one worker, otherwise there will be less executors than planned.
Memory for the driver:
Noch kein Feedback
Formular wird geladen...