Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

苗海泉
Hello everybody, I use spark 1.6.2 and alluxio-1.2.0 , when I start test persist in spark-shell as follow codes,what happen?
Because new vesion alluxio not support this ? 
If you kown the reason ,please tell me ,thank you very much!

scala> val afile = sc.textFile("hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000")
afile: org.apache.spark.rdd.RDD[String] = hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000 MapPartitionsRDD[9] at textFile at <console>:27

scala> afile.count
res4: Long = 88                                                                 

scala> import org.apache.spark.storage.StorageLevel
import org.apache.spark.storage.StorageLevel

scala> afile.persist(StorageLevel.OFF_HEAP)
res5: afile.type = hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000 MapPartitionsRDD[9] at textFile at <console>:27

scala> afile.count
[Stage 1:>                                                          (0 + 2) / 2]16/07/27 14:32:58 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, spark24): org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_9_0!
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

16/07/27 14:32:58 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 6, spark24): org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_9_0!
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)

On Thursday, March 26, 2015 at 9:27:42 AM UTC+8, [hidden email] wrote:
      I run the job with spark 1.3 /Tachyon 0.6.1/hadoop 2.5.0-cdh5.2.0, when the memory of tachyon reach 100% used,it orrurs the exception  "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException",i use rdd.persistence(StorageLevel.OFF_HEAP),Doesn't this mode store the rdd in underfs? When the numbers of RDD is too large to store in tachyon,how to avoid the loss of RDD?

thank you for any help!

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

Pei Sun
Hi,
    It is not recommended to use persist(OFF_HEAP) anymore. This feature is removed in spark 2.0.0. You can use saveAsTextFile or saveAsObjectFile (slower).

Pei

On Tue, Jul 26, 2016 at 11:51 PM, 苗海泉 <[hidden email]> wrote:
Hello everybody, I use spark 1.6.2 and alluxio-1.2.0 , when I start test persist in spark-shell as follow codes,what happen?
Because new vesion alluxio not support this ? 
If you kown the reason ,please tell me ,thank you very much!

scala> val afile = sc.textFile("hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000")
afile: org.apache.spark.rdd.RDD[String] = hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000 MapPartitionsRDD[9] at textFile at <console>:27

scala> afile.count
res4: Long = 88                                                                 

scala> import org.apache.spark.storage.StorageLevel
import org.apache.spark.storage.StorageLevel

scala> afile.persist(StorageLevel.OFF_HEAP)
res5: afile.type = hdfs://spark29:9000/home/logs/nat/nat_1467220740000_1467220800000 MapPartitionsRDD[9] at textFile at <console>:27

scala> afile.count
[Stage 1:>                                                          (0 + 2) / 2]16/07/27 14:32:58 WARN TaskSetManager: Lost task 0.0 in stage 1.0 (TID 2, spark24): org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_9_0!
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)
at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:78)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:268)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

16/07/27 14:32:58 ERROR TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 6, spark24): org.apache.spark.storage.BlockException: Block manager failed to return cached value for rdd_9_0!
at org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:158)

On Thursday, March 26, 2015 at 9:27:42 AM UTC+8, [hidden email] wrote:
      I run the job with spark 1.3 /Tachyon 0.6.1/hadoop 2.5.0-cdh5.2.0, when the memory of tachyon reach 100% used,it orrurs the exception  "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException",i use rdd.persistence(StorageLevel.OFF_HEAP),Doesn't this mode store the rdd in underfs? When the numbers of RDD is too large to store in tachyon,how to avoid the loss of RDD?

thank you for any help!

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

苗海泉
Hello, I want to persist some data to hdfs,so I use commands as follows:
alluxio fs persist /home/miaohq/shigzNatRadius
and it tell me is already persist ,but I can't see it in hdfs why?
And then I copy to local I use command:
 alluxio fs copyToLocal /home/miaohq/shigzNatRadius
/data/spark/miaohq/dpidata/dpilog

The cmd execute some copy ,then block ,shigzNatRadius is dir ,this
have 121 dirs,and one dir have 2-3 files.
please tell me why? thank you very much!


java :java version "1.8.0_77"
Alluxio version: 1.2.0
os:Linux spark29 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST
2013 x86_64 x86_64 x86_64 GNU/Linux

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

Pei Sun
Hey, 
   Can you share your alluxio logs? Also if it says that the file already exists, can you try to ls the hdfs directory and see what is in it?

Pei

On Fri, Jul 29, 2016 at 2:50 AM, 苗海泉 <[hidden email]> wrote:
Hello, I want to persist some data to hdfs,so I use commands as follows:
alluxio fs persist /home/miaohq/shigzNatRadius
and it tell me is already persist ,but I can't see it in hdfs why?
And then I copy to local I use command:
 alluxio fs copyToLocal /home/miaohq/shigzNatRadius
/data/spark/miaohq/dpidata/dpilog

The cmd execute some copy ,then block ,shigzNatRadius is dir ,this
have 121 dirs,and one dir have 2-3 files.
please tell me why? thank you very much!


java :java version "1.8.0_77"
Alluxio version: 1.2.0
os:Linux spark29 2.6.32-431.el6.x86_64 #1 SMP Sun Nov 10 22:19:54 EST
2013 x86_64 x86_64 x86_64 GNU/Linux

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Pei Sun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Spark OFF_HEAP occurs "java.lang.RuntimeException: org.apache.spark.storage.BlockNotFoundException"

Pei Sun
Hi,
   1. If you have linked alluxio client jar to Spark, you can see alluxio related information in spark client logs. You can specify how to save the spark logs by modifying ${YOUR_SPARK_HOME}/conf/log4j.properties . 
   2. You need to specify the Alluxio client jar in spark's classpath and then uses alluxio uri in spark application. You can find documentation here.
Pei 

I pleased to share your alluxio log,but I found nothing about spark
persist contents in alluxio logs.

I doubt about how  the spark to find alluxio to persist it's cache
data ?  Our alluxio is separate deployment  in spark cluster not use
spark’s inside alluxio.

This is wrong?  Whether  I loss some special conf about spark or alluxio?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.