Spark run on the Alluxio was slower than HDFS

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Spark run on the Alluxio was slower than HDFS

lazyman322
Why the performance of Spark run on the Alluxio was slower than HDFS? How to prove the performance improvement of Spark on Alluxio by some simple cases ?
The following are my test pseudo codes.

Read from alluxio
val rdd = sc.textFile("alluxio://test.txt")
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

Read from HDFS
val rdd = sc.textFile("test.txt")
rdd.persist(MEMORY_ONLY)
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark run on the Alluxio was slower than HDFS

Pei Sun
Can you share the following information?

Machine spec: ram, cpu
Spark configs: deploy-mode, driver-memory, executor-memory
Input spec: size of "test.txt"
Alluxio configs: WORKER_MEMORY_SIZE,

Alluxio can help if the input size is too small compared to machine size. I can give more insights if you share more information with me.

Pei

On Tue, Jul 19, 2016 at 9:28 AM, lazyman322 <[hidden email]> wrote:
Why the performance of Spark run on the Alluxio was slower than HDFS? How to prove the performance improvement of Spark on Alluxio by some simple cases ?
The following are my test pseudo codes.

Read from alluxio
val rdd = sc.textFile("alluxio://test.txt")
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

Read from HDFS
val rdd = sc.textFile("test.txt")
rdd.persist(MEMORY_ONLY)
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark run on the Alluxio was slower than HDFS

Pei Sun


On Tue, Jul 19, 2016 at 10:08 AM, Pei Sun <[hidden email]> wrote:
Can you share the following information?

Machine spec: ram, cpu
Spark configs: deploy-mode, driver-memory, executor-memory
Input spec: size of "test.txt"
Alluxio configs: WORKER_MEMORY_SIZE,

 
Alluxio can help if the input size is too small compared to machine size. I can give more insights if you share more information with me.
typo here. "Alluxio can help if the input size is NOT too small compared to machine size. "
 

Pei

On Tue, Jul 19, 2016 at 9:28 AM, lazyman322 <[hidden email]> wrote:
Why the performance of Spark run on the Alluxio was slower than HDFS? How to prove the performance improvement of Spark on Alluxio by some simple cases ?
The following are my test pseudo codes.

Read from alluxio
val rdd = sc.textFile("alluxio://test.txt")
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

Read from HDFS
val rdd = sc.textFile("test.txt")
rdd.persist(MEMORY_ONLY)
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark run on the Alluxio was slower than HDFS

Pei Sun
Was your problem resolved?

Pei

On Tue, Jul 19, 2016 at 10:21 AM, Pei Sun <[hidden email]> wrote:


On Tue, Jul 19, 2016 at 10:08 AM, Pei Sun <[hidden email]> wrote:
Can you share the following information?

Machine spec: ram, cpu
Spark configs: deploy-mode, driver-memory, executor-memory
Input spec: size of "test.txt"
Alluxio configs: WORKER_MEMORY_SIZE,

 
Alluxio can help if the input size is too small compared to machine size. I can give more insights if you share more information with me.
typo here. "Alluxio can help if the input size is NOT too small compared to machine size. "
 

Pei

On Tue, Jul 19, 2016 at 9:28 AM, lazyman322 <[hidden email]> wrote:
Why the performance of Spark run on the Alluxio was slower than HDFS? How to prove the performance improvement of Spark on Alluxio by some simple cases ?
The following are my test pseudo codes.

Read from alluxio
val rdd = sc.textFile("alluxio://test.txt")
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

Read from HDFS
val rdd = sc.textFile("test.txt")
rdd.persist(MEMORY_ONLY)
rdd.count()
val startTime = System.currentTimeMillis
rdd.count()
println(System.currentTimeMillis-startTime )

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun



--
Pei Sun



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.