Connection reset error on load

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Connection reset error on load

Jais Sebastian
When we are running load test with multiple concurrent requests we sometimes gets below error. Which is very intermittent 

18/06/01 08:28:05.344 : [, ]  585198 [Executor task launch worker for task 21454] WARN alluxio.AbstractClient  - Failed to connect (0) with FileSystemMasterClient @ <alluxio master>:19998: java.net.SocketException: Connection reset
18/06/01 08:28:05.345 : [, ]  585199 [Executor task launch worker for task 21454] INFO org.apache.spark.executor.Executor  - Executor interrupted and killed task 83.0 in stage 2392.0 (TID 21454), reason: stage cancelled

What would the required configuration to solve this issue?

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset error on load

Jais Sebastian
Attached our configuration for reference 

On Sunday, June 3, 2018 at 11:34:20 PM UTC+5:30, Jais Sebastian wrote:
When we are running load test with multiple concurrent requests we sometimes gets below error. Which is very intermittent 

18/06/01 08:28:05.344 : [, ]  585198 [Executor task launch worker for task 21454] WARN alluxio.AbstractClient  - Failed to connect (0) with FileSystemMasterClient @ <alluxio master>:19998: java.net.SocketException: Connection reset
18/06/01 08:28:05.345 : [, ]  585199 [Executor task launch worker for task 21454] INFO org.apache.spark.executor.Executor  - Executor interrupted and killed task 83.0 in stage 2392.0 (TID 21454), reason: stage cancelled

What would the required configuration to solve this issue?

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

AlluxioConfig.txt (11K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset error on load

binfeng
Hi Jais,

It could be that your master RPC connection thread pool is depleted. Try increasing alluxio.master.worker.threads.max and see if that improves your situation.

Thanks,
Bin Feng

On Sunday, June 3, 2018 at 11:12:08 AM UTC-7, Jais Sebastian wrote:
Attached our configuration for reference 

On Sunday, June 3, 2018 at 11:34:20 PM UTC+5:30, Jais Sebastian wrote:
When we are running load test with multiple concurrent requests we sometimes gets below error. Which is very intermittent 

18/06/01 08:28:05.344 : [, ]  585198 [Executor task launch worker for task 21454] WARN alluxio.AbstractClient  - Failed to connect (0) with FileSystemMasterClient @ <alluxio master>:19998: java.net.SocketException: Connection reset
18/06/01 08:28:05.345 : [, ]  585199 [Executor task launch worker for task 21454] INFO org.apache.spark.executor.Executor  - Executor interrupted and killed task 83.0 in stage 2392.0 (TID 21454), reason: stage cancelled

What would the required configuration to solve this issue?

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset error on load

Jais Sebastian
Thanks Feng. That worked for us.

On Thursday, June 7, 2018 at 2:49:11 AM UTC+5:30, [hidden email] wrote:
Hi Jais,

It could be that your master RPC connection thread pool is depleted. Try increasing alluxio.master.worker.threads.max and see if that improves your situation.

Thanks,
Bin Feng

On Sunday, June 3, 2018 at 11:12:08 AM UTC-7, Jais Sebastian wrote:
Attached our configuration for reference 

On Sunday, June 3, 2018 at 11:34:20 PM UTC+5:30, Jais Sebastian wrote:
When we are running load test with multiple concurrent requests we sometimes gets below error. Which is very intermittent 

18/06/01 08:28:05.344 : [, ]  585198 [Executor task launch worker for task 21454] WARN alluxio.AbstractClient  - Failed to connect (0) with FileSystemMasterClient @ <alluxio master>:19998: java.net.SocketException: Connection reset
18/06/01 08:28:05.345 : [, ]  585199 [Executor task launch worker for task 21454] INFO org.apache.spark.executor.Executor  - Executor interrupted and killed task 83.0 in stage 2392.0 (TID 21454), reason: stage cancelled

What would the required configuration to solve this issue?

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Connection reset error on load

binfeng
Glad that helped!

On Thursday, June 21, 2018 at 6:39:24 PM UTC-7, Jais Sebastian wrote:
Thanks Feng. That worked for us.

On Thursday, June 7, 2018 at 2:49:11 AM UTC+5:30, [hidden email] wrote:
Hi Jais,

It could be that your master RPC connection thread pool is depleted. Try increasing alluxio.master.worker.threads.max and see if that improves your situation.

Thanks,
Bin Feng

On Sunday, June 3, 2018 at 11:12:08 AM UTC-7, Jais Sebastian wrote:
Attached our configuration for reference 

On Sunday, June 3, 2018 at 11:34:20 PM UTC+5:30, Jais Sebastian wrote:
When we are running load test with multiple concurrent requests we sometimes gets below error. Which is very intermittent 

18/06/01 08:28:05.344 : [, ]  585198 [Executor task launch worker for task 21454] WARN alluxio.AbstractClient  - Failed to connect (0) with FileSystemMasterClient @ <alluxio master>:19998: java.net.SocketException: Connection reset
18/06/01 08:28:05.345 : [, ]  585199 [Executor task launch worker for task 21454] INFO org.apache.spark.executor.Executor  - Executor interrupted and killed task 83.0 in stage 2392.0 (TID 21454), reason: stage cancelled

What would the required configuration to solve this issue?

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.