RessourceLeakDetector netty error on proxy

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RessourceLeakDetector netty error on proxy

francois.autaa.pro
Alluxio 1.8.0
Spark 2.4.0
Java 8
Proxy xmx=4g

Hi I'm facing an issue when trying to access a big bunch of small file.

I've set up a cluster with 5 nodes and on each node I'm running spark with mesos + alluxio worker and proxy.

When I launch things on small batch everything run smooth but when I go bigger 5Gb bunch of file I've got this precise error

2018-10-12 15:21:56,157 ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
2018-10-12 15:21:56,213 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@d89989, remoteAddress: cluster-node-02/10.94.21.195:29999).
...

Sounds to be a netty issue but no idea from where it came from. My batch is directly crashing in a Read Time Out issue

Any idea ?

regard

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: RessourceLeakDetector netty error on proxy

Andrew Audibert
Hello,

The leak could be an issue with the compute framework not closing Alluxio input/output streams. Can you share more details about what sort of job you're running, as well as what under storage system you're using? It would also help to know the exact error message you're getting, including stack trace if there is one.

You could potentially fix the timeout issue by increasing the client timeout alluxio.user.network.netty.timeout.ms, which defaults to 30 seconds.

Best,
Andrew

On Friday, October 12, 2018 at 9:11:25 AM UTC-7, [hidden email] wrote:
Alluxio 1.8.0
Spark 2.4.0
Java 8
Proxy xmx=4g

Hi I'm facing an issue when trying to access a big bunch of small file.

I've set up a cluster with 5 nodes and on each node I'm running spark with mesos + alluxio worker and proxy.

When I launch things on small batch everything run smooth but when I go bigger 5Gb bunch of file I've got this precise error

2018-10-12 15:21:56,157 ERROR ResourceLeakDetector - LEAK: ByteBuf.release() was not called before it's garbage-collected. See <a href="http://netty.io/wiki/reference-counted-objects.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fnetty.io%2Fwiki%2Freference-counted-objects.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEv9xKfvI1UO_qY1VuwOIq_Rdt9iw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fnetty.io%2Fwiki%2Freference-counted-objects.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEv9xKfvI1UO_qY1VuwOIq_Rdt9iw&#39;;return true;">http://netty.io/wiki/reference-counted-objects.html for more information.
Recent access records:
2018-10-12 15:21:56,213 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@d89989, remoteAddress: cluster-node-02/<a href="http://10.94.21.195:29999" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.94.21.195%3A29999\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF7lXOu60qnigJD1HeG7Setzipdg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.94.21.195%3A29999\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHF7lXOu60qnigJD1HeG7Setzipdg&#39;;return true;">10.94.21.195:29999).
...

Sounds to be a netty issue but no idea from where it came from. My batch is directly crashing in a Read Time Out issue

Any idea ?

regard

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: RessourceLeakDetector netty error on proxy

francois.autaa.pro
The error was linked with my S3RestHandler modification for allowing the usage of range readinf (alluxio team will have soon a pull request on it )

This topic can be deleted

Regards

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.