Slow Init accessing to alluxio with s3 as underfs using alluxio.hadoop.FileSystem

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow Init accessing to alluxio with s3 as underfs using alluxio.hadoop.FileSystem

ycao
Hi, 
  Don't know whether anyone encountered this, the first time accessing of alluxio url with s3 as underfs folder using alluxio.hadoop.FileSystem as client is always quite slow (>8 minutes, usually 10 minutes and up).  Alluxio UI will stuck for browsing during that time, seems the Alluxio service is doing some busy work under the hood, even when I just fetch one record from one file in that folder. 
I tuned thread number for accessing s3, doesn’t help, running “alluxio runTests” ahead won’t warm it up and can’t help either.
sample command
   val text=spark.read.text("alluxio://path/to/a/s3/folder/containing/tensOfThousandsFiles")
   
text.take(1)


  Any reasoning of this and solutions?
Thanks! 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Slow Init accessing to alluxio with s3 as underfs using alluxio.hadoop.FileSystem

ycao
BTW: I'm using the latest version 1.71.

On Friday, April 13, 2018 at 11:17:01 AM UTC-7, [hidden email] wrote:
Hi, 
  Don't know whether anyone encountered this, the first time accessing of alluxio url with s3 as underfs folder using alluxio.hadoop.FileSystem as client is always quite slow (>8 minutes, usually 10 minutes and up).  Alluxio UI will stuck for browsing during that time, seems the Alluxio service is doing some busy work under the hood, even when I just fetch one record from one file in that folder. 
I tuned thread number for accessing s3, doesn’t help, running “alluxio runTests” ahead won’t warm it up and can’t help either.
sample command
   val text=spark.read.text("alluxio://path/to/a/s3/folder/containing/tensOfThousandsFiles")
   
text.take(1)


  Any reasoning of this and solutions?
Thanks! 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Slow Init accessing to alluxio with s3 as underfs using alluxio.hadoop.FileSystem

Calvin Jia
Hi,

On the first access, Alluxio may need to load the metadata of the files in the under store (in your case, s3) to discover the files. This is usually a one-time operation unless you are frequently generating thousands of new files in a short period in the under store without going through Alluxio.

You can try running bin/alluxio fs ls -f alluxio://path/to/a/s3/folder/containing/tensOfThousandsFiles to prevent this operation from making the job slow. However, you will still need to load the files which will take time (the ~10 minutes or so you mentioned).

Hope this helps,
Calvin

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.