When playing around with Spark on my local, virtual cluster, I ran into some problems concerning resources, even though I had 3 workers running on 3 nodes. I got the following message repeatedly while working with data in the spark-shell:

Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory

After googling around a bit, I discovered this was due to low memory (Java Heap Size) settings. If you're using Cloudera Manager, like I do, you can change the memory usage as follows:

  • On the homepage of Cloudera Manager, where it shows your cluster ('Cluster 1' in my case), click on the Spark service.
  • Now you see an overview of the Spark service, with a status summary of the Master and the Workers. Click on “Configuration > View and Edit”.
  • In the Category section, first choose Master Default Group, and edit the Java Heap Size of Master in Bytes to give it a larger amount of memory. In my case it was 64 MiB (which is probably way too less), so I just clicked default value, which set the Heap Size to 512 MiB.
  • In the Category section, now choose Worker Default Group, and edit the Java Heap Size of Worker in Bytes to give it a larger amount of memory. In my case it was 64 MiB (which again, is too less), I set it to the default value again.
  • Now also edit the Total Java Heap Sizes of Worker's Executors in Bytes to give the Executors more memory. I set it to 1 GiB, which is half the memory my nodes have. I have a feeling that this is actually the most important setting for dealing with the aforementioned error.
  • Now you have to restart the Spark service:
    • On the homepage of CM, it should show a restart icon next to the service name. Click it.
    • You can review the changes, and then click Restart
    • Select Re-deploy client configuration and click Restart Now, and the service should now restart.

I can't actually tell you how much memory you should give the Master, Workers and Executors, because that depends on a lot of factors: how large are your nodes? What else is running on your nodes? And most importantly: how large are the datasets you're trying to process? Maybe if someone of Cloudera reads this, they could chime in with better information on how to tune Spark's memory.



Comments

comments powered by Disqus