An incredible experience when using Alluxio

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

An incredible experience when using Alluxio

Hector Zhang
Yesterday I found that file operations will be blocked on my Alluxio cluster. Such as ls operation:
./bin/alluxio fs ls /

And the I restarted Alluxio master with debug mode, I found list status method in file system master is blocked at the lock inode path operation. This means that the root inode has another write lock. At last I found the write lock is blocked by a read lock, which is getting file info of a path, a mount point of a HDFS path such as  hdfs://thost:tpot/tpath, and the real block point is the communication with HDFS.

The HDFS name node on thost:tport is an environment HDFS server, I shutdown it some days before. But another old version HDFS server startup at the same port, and the Alluxio master is get HDFS file info with this unmatched HDFS server, and blocked.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: An incredible experience when using Alluxio

Calvin Jia
Hi,

Thanks for reporting this issue, do you mind opening a JIRA ticket with similar information? 

In general any UFS operation while holding an Alluxio lock can be a risky dependency, and we can consider putting a blanket limit on waiting for a response. In this case, was the HDFS client being used completely stuck or just had a long timeout/retry value?

Cheers,
Calvin

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: An incredible experience when using Alluxio

Hector Zhang
Well, putting a blanket limit on waiting for response may a good idea, I will open a JIRA.

I just mount the hdfs with all default configurations supported by Alluxio. I debugged the source code, and found that there is a default socket timeout limit. So I think it the RPC request which blocked, because the hdfs client and server are in different version, so the socket connected, but the RPC request can not complete.


On Wednesday, August 15, 2018 at 6:03:34 AM UTC+8, Calvin Jia wrote:
Hi,

Thanks for reporting this issue, do you mind opening a <a href="https://alluxio.atlassian.net/projects/ALLUXIO/issues" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Falluxio.atlassian.net%2Fprojects%2FALLUXIO%2Fissues\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGTFif7qvZoNAXjhkcosrkzwqPyng&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Falluxio.atlassian.net%2Fprojects%2FALLUXIO%2Fissues\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGTFif7qvZoNAXjhkcosrkzwqPyng&#39;;return true;">JIRA ticket with similar information? 

In general any UFS operation while holding an Alluxio lock can be a risky dependency, and we can consider putting a blanket limit on waiting for a response. In this case, was the HDFS client being used completely stuck or just had a long timeout/retry value?

Cheers,
Calvin

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.