Facing Java Heap Space issue with large files

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Facing Java Heap Space issue with large files

Shail Shah
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Facing Java Heap Space issue with large files

Jiří Šimša
Hi Shail,

Let me see if I can help you troubleshoot your problem. To do that, I need to understand your environment better.

What version of Alluxio and Spark are you using? What UFS are you using? How much memory does the workstation on which you are running your program have? Could you also please attach the complete output of your program as well as the Alluxio master, worker, and user logs (these can be found in ${ALLUXIO_HOME}/logs)?

Best,

On Tue, Aug 9, 2016 at 7:17 AM, Shail Shah <[hidden email]> wrote:
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Facing Java Heap Space issue with large files

Shail Shah
Hi Jiri,

Alluxio version:- alluxio-1.1.0
Spark version:- spark-1.6.2-bin-hadoop2.6
I am running alluxio on local.

Please find the runtime log at http://paste.ubuntu.com/22829467/

The count function runs perfectly fine while running for 1 gb file. But it gives heap spcae error while running count on 2 GB file. I am running the program on machine with 30 GB ram.
 

Thanks,
Shail Shah

On Tue, Aug 9, 2016 at 10:53 PM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Let me see if I can help you troubleshoot your problem. To do that, I need to understand your environment better.

What version of Alluxio and Spark are you using? What UFS are you using? How much memory does the workstation on which you are running your program have? Could you also please attach the complete output of your program as well as the Alluxio master, worker, and user logs (these can be found in ${ALLUXIO_HOME}/logs)?

Best,

On Tue, Aug 9, 2016 at 7:17 AM, Shail Shah <[hidden email]> wrote:
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Facing Java Heap Space issue with large files

Jiří Šimša
Hi Shail,

Can you try increasing the driver-memory option to 4GB and see if that fixes your issue?

Best,

On Tue, Aug 9, 2016 at 11:44 AM, Shail Shah <[hidden email]> wrote:
Hi Jiri,

Alluxio version:- alluxio-1.1.0
Spark version:- spark-1.6.2-bin-hadoop2.6
I am running alluxio on local.

Please find the runtime log at http://paste.ubuntu.com/22829467/

The count function runs perfectly fine while running for 1 gb file. But it gives heap spcae error while running count on 2 GB file. I am running the program on machine with 30 GB ram.
 

Thanks,
Shail Shah

On Tue, Aug 9, 2016 at 10:53 PM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Let me see if I can help you troubleshoot your problem. To do that, I need to understand your environment better.

What version of Alluxio and Spark are you using? What UFS are you using? How much memory does the workstation on which you are running your program have? Could you also please attach the complete output of your program as well as the Alluxio master, worker, and user logs (these can be found in ${ALLUXIO_HOME}/logs)?

Best,

On Tue, Aug 9, 2016 at 7:17 AM, Shail Shah <[hidden email]> wrote:
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša




--
Jiří Šimša

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Facing Java Heap Space issue with large files

Shail Shah
Hi Jiri,

I tried increasing the driver memory, number of executors and executor memory, but all failed. I increased the memory of each of the component upto 8G. 
I found the issue which led to java heap space error. Whenever, I use to run the sample-2g file mentioned in the alluxio docs, one of the spark task got stuck at some particular offset in the file. So, while all the other tasks got completed in matter of seconds, one executor use to block and increase the java heap space. When I tried this with other 2Gb file, it ran fine without even increasing the driver and executor memory.

Thanks,
Shail Shah


On Thu, Aug 11, 2016 at 1:15 AM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Can you try increasing the driver-memory option to 4GB and see if that fixes your issue?

Best,

On Tue, Aug 9, 2016 at 11:44 AM, Shail Shah <[hidden email]> wrote:
Hi Jiri,

Alluxio version:- alluxio-1.1.0
Spark version:- spark-1.6.2-bin-hadoop2.6
I am running alluxio on local.

Please find the runtime log at http://paste.ubuntu.com/22829467/

The count function runs perfectly fine while running for 1 gb file. But it gives heap spcae error while running count on 2 GB file. I am running the program on machine with 30 GB ram.
 

Thanks,
Shail Shah

On Tue, Aug 9, 2016 at 10:53 PM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Let me see if I can help you troubleshoot your problem. To do that, I need to understand your environment better.

What version of Alluxio and Spark are you using? What UFS are you using? How much memory does the workstation on which you are running your program have? Could you also please attach the complete output of your program as well as the Alluxio master, worker, and user logs (these can be found in ${ALLUXIO_HOME}/logs)?

Best,

On Tue, Aug 9, 2016 at 7:17 AM, Shail Shah <[hidden email]> wrote:
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša




--
Jiří Šimša

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Facing Java Heap Space issue with large files

Jiří Šimša
Hi Shail,

I am glad to hear you were able to track down the root cause. Thanks for letting me know.

Best,

On Wed, Aug 10, 2016 at 10:37 PM, Shail Shah <[hidden email]> wrote:
Hi Jiri,

I tried increasing the driver memory, number of executors and executor memory, but all failed. I increased the memory of each of the component upto 8G. 
I found the issue which led to java heap space error. Whenever, I use to run the sample-2g file mentioned in the alluxio docs, one of the spark task got stuck at some particular offset in the file. So, while all the other tasks got completed in matter of seconds, one executor use to block and increase the java heap space. When I tried this with other 2Gb file, it ran fine without even increasing the driver and executor memory.

Thanks,
Shail Shah


On Thu, Aug 11, 2016 at 1:15 AM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Can you try increasing the driver-memory option to 4GB and see if that fixes your issue?

Best,

On Tue, Aug 9, 2016 at 11:44 AM, Shail Shah <[hidden email]> wrote:
Hi Jiri,

Alluxio version:- alluxio-1.1.0
Spark version:- spark-1.6.2-bin-hadoop2.6
I am running alluxio on local.

Please find the runtime log at http://paste.ubuntu.com/22829467/

The count function runs perfectly fine while running for 1 gb file. But it gives heap spcae error while running count on 2 GB file. I am running the program on machine with 30 GB ram.
 

Thanks,
Shail Shah

On Tue, Aug 9, 2016 at 10:53 PM, Jiří Šimša <[hidden email]> wrote:
Hi Shail,

Let me see if I can help you troubleshoot your problem. To do that, I need to understand your environment better.

What version of Alluxio and Spark are you using? What UFS are you using? How much memory does the workstation on which you are running your program have? Could you also please attach the complete output of your program as well as the Alluxio master, worker, and user logs (these can be found in ${ALLUXIO_HOME}/logs)?

Best,

On Tue, Aug 9, 2016 at 7:17 AM, Shail Shah <[hidden email]> wrote:
Hi ,

I am trying to run following command in spark-shell where sample-2g is a 2GB file. I am continuously getting java heap space error. The same program runs with sampl-1g (1GB file).  The alluxio is setup with the following configurations
ALLUXIO_TASK_LOG=${ALLUXIO_TASK_LOG:-"/home/ec2-user/alluxio-final/alluxio-1.1.0/logs/"}
ALLUXIO_MASTER_HOSTNAME=${ALLUXIO_MASTER_HOSTNAME:-"localhost"}
ALLUXIO_WORKER_MEMORY_SIZE=${ALLUXIO_WORKER_MEMORY_SIZE:-"20454MB"}

I also tried with allocating driver memory = 5G and executor memory = 5G.  
  • val alluxioFile = sc.textFile("alluxio://localhost:19998/sample-2g")
  • alluxioFile.count()

Could you please help me in debugging this issue.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Jiří Šimša




--
Jiří Šimša




--
Jiří Šimša

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.