Alluxio write on spark -- taking a long time -- what can be tuned?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Alluxio write on spark -- taking a long time -- what can be tuned?

William Callaghan
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

alluxio_config.txt (2K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio write on spark -- taking a long time -- what can be tuned?

Pei Sun
Hi William,
     Saving such a small is all most metadata operations. On my end, it took less than a second .  To better understand your problem, can you send me the following:

1. Machine specs (e.g. ram size, disk size, #cpu).
2.  How you start alluxio and spark (e.g. is Alluxio worker colocated with spark worker).
3. Spark log.
4. Alluxio master and worker log.

Pei

On Fri, Jul 22, 2016 at 7:54 AM, William Callaghan <[hidden email]> wrote:
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio write on spark -- taking a long time -- what can be tuned?

William Callaghan
Pei,

Machine Specs:
- 21GB RAM
- 4 cores
- Available disk space on the machine is 500gb

Alluxio is on the same machine as Spark (they are running in different docker containers)

On Saturday, July 23, 2016 at 6:15:44 PM UTC-4, Pei Sun wrote:
Hi William,
     Saving such a small is all most metadata operations. On my end, it took less than a second .  To better understand your problem, can you send me the following:

1. Machine specs (e.g. ram size, disk size, #cpu).
2.  How you start alluxio and spark (e.g. is Alluxio worker colocated with spark worker).
3. Spark log.
4. Alluxio master and worker log.

Pei

On Fri, Jul 22, 2016 at 7:54 AM, William Callaghan <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="EwUYA0KqAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">wca...@...> wrote:
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (<a href="https://github.com/spark-jobserver/spark-jobserver" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fspark-jobserver%2Fspark-jobserver\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH64yMBeOnoIadEzGnZQCNxlsJ2DQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2Fspark-jobserver%2Fspark-jobserver\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH64yMBeOnoIadEzGnZQCNxlsJ2DQ&#39;;return true;">https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="EwUYA0KqAwAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio write on spark -- taking a long time -- what can be tuned?

Pei Sun


On Wed, Jul 27, 2016 at 10:56 AM, William Callaghan <[hidden email]> wrote:
Pei,

Machine Specs:
- 21GB RAM
- 4 cores
- Available disk space on the machine is 500gb

Alluxio is on the same machine as Spark (they are running in different docker containers)

On Saturday, July 23, 2016 at 6:15:44 PM UTC-4, Pei Sun wrote:
Hi William,
     Saving such a small is all most metadata operations. On my end, it took less than a second .  To better understand your problem, can you send me the following:

1. Machine specs (e.g. ram size, disk size, #cpu).
 
2.  How you start alluxio and spark (e.g. is Alluxio worker colocated with spark worker).
3. Spark log.
4. Alluxio master and worker log.

Can you share these with me as well? Thank you 

Pei

On Fri, Jul 22, 2016 at 7:54 AM, William Callaghan <[hidden email]> wrote:
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio write on spark -- taking a long time -- what can be tuned?

Pei Sun
Hi William,
   Was your problem resolved?

Pei


On Wed, Jul 27, 2016 at 10:58 AM, Pei Sun <[hidden email]> wrote:


On Wed, Jul 27, 2016 at 10:56 AM, William Callaghan <[hidden email]> wrote:
Pei,

Machine Specs:
- 21GB RAM
- 4 cores
- Available disk space on the machine is 500gb

Alluxio is on the same machine as Spark (they are running in different docker containers)

On Saturday, July 23, 2016 at 6:15:44 PM UTC-4, Pei Sun wrote:
Hi William,
     Saving such a small is all most metadata operations. On my end, it took less than a second .  To better understand your problem, can you send me the following:

1. Machine specs (e.g. ram size, disk size, #cpu).
 
2.  How you start alluxio and spark (e.g. is Alluxio worker colocated with spark worker).
3. Spark log.
4. Alluxio master and worker log.

Can you share these with me as well? Thank you 

Pei

On Fri, Jul 22, 2016 at 7:54 AM, William Callaghan <[hidden email]> wrote:
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio write on spark -- taking a long time -- what can be tuned?

Pei Sun
Hi Willam,
    I am glad to hear that your problem has been resolved. 

Thank you
Pei

On Mon, Aug 8, 2016 at 10:09 AM, William Callaghan <[hidden email]> wrote:
Yes. The issue was that although Alluxio and Spark were on the same physical host, they were in separate docker containers. Even with "--net=host" I ran into a lot of problems, speed and networking wise. Putting Alluxio and Spark in the same container solved the problem.

Sent from my iPhone

On Aug 8, 2016, at 11:39 AM, Pei Sun <[hidden email]> wrote:

Hi William,
   Was your problem resolved?

Pei


On Wed, Jul 27, 2016 at 10:58 AM, Pei Sun <[hidden email]> wrote:


On Wed, Jul 27, 2016 at 10:56 AM, William Callaghan <[hidden email]> wrote:
Pei,

Machine Specs:
- 21GB RAM
- 4 cores
- Available disk space on the machine is 500gb

Alluxio is on the same machine as Spark (they are running in different docker containers)

On Saturday, July 23, 2016 at 6:15:44 PM UTC-4, Pei Sun wrote:
Hi William,
     Saving such a small is all most metadata operations. On my end, it took less than a second .  To better understand your problem, can you send me the following:

1. Machine specs (e.g. ram size, disk size, #cpu).
 
2.  How you start alluxio and spark (e.g. is Alluxio worker colocated with spark worker).
3. Spark log.
4. Alluxio master and worker log.

Can you share these with me as well? Thank you 

Pei

On Fri, Jul 22, 2016 at 7:54 AM, William Callaghan <[hidden email]> wrote:
Hi there,

New to Alluxio. Running a Spark job using the Spark Job Server (https://github.com/spark-jobserver/spark-jobserver). Have Alluxio running in a docker container with a master and a worker.

Alluxio v 1.1.0
Spark 1.6.0 (1 master, 2 workers)
Alluxio Config (see attached)
OS: Ubuntu 14.04

When I run my spark job, the time to save a small text file (less than 500 bytes) is about 6 seconds.
Spark logs showing a number of things going on with alluxio such as:
- getting working directories
- getting file status
- listing status
- renaming, deleting, creating.

All I'm doing is trying to save an RDD using saveToTextFile("alluxio://<master>:<port>/<path>"). The fact that its 6 seconds long has me believing that I have something configured improperly.
Any help would be greatly appreciated.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email]om.
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun



--
Pei Sun



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.