heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Also, dumb question: in the Gluster docs, where it says to point Alluxio to the Gluster volume-- that's a mounted Gluster volume (via the glusterfs client) right? Or is it something else?

-Paul

On Wednesday, June 15, 2016, <paul@> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:_e(%7B%7D,&#39;cvml&#39;,&#39;alluxio-users%2Bunsubscribe@googlegroups.com&#39;);" target="_blank">alluxio-users+unsubscribe@....
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Bin Fan
In reply to this post by paul
hi Paul,

* For Spark workers to be able to talk to Alluxio, they need to have alluxio client jar on their classpath.
And yes, you need to do this on each machine running Spark workers.


* It is fine for a Spark worker to talk to Alluxio cluster even it has no local Alluxio worker. Alluxio client and master will
negotiate and figure out where those workers are.

* Are you using Spark cluster in standalone mode? In that case, data locality should be respected by Spark to schedule workers.



On Wed, Jun 15, 2016 at 2:30 PM, <[hidden email]> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Bin Fan
In reply to this post by paul
Yes, that is supposed to be a Gluster volume

On Wed, Jun 15, 2016 at 2:38 PM, Paul Wais <[hidden email]> wrote:
Also, dumb question: in the Gluster docs, where it says to point Alluxio to the Gluster volume-- that's a mounted Gluster volume (via the glusterfs client) right? Or is it something else?

-Paul

On Wednesday, June 15, 2016, <paul@> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Thanks all!!  I'll put the jar on my spark workers and see if I can repro the NODE_LOCAL stuff shown in the tutorial.

On Wednesday, June 15, 2016 at 3:17:57 PM UTC-7, Bin Fan wrote:
Yes, that is supposed to be a Gluster volume

On Wed, Jun 15, 2016 at 2:38 PM, Paul Wais <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="F0BvOK-tAAAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">pa...@...> wrote:
Also, dumb question: in the Gluster docs, where it says to point Alluxio to the Gluster volume-- that's a mounted Gluster volume (via the glusterfs client) right? Or is it something else?

-Paul

On Wednesday, June 15, 2016, <paul@> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Hi Bin,

I'm running into an odd "No available Alluxio worker found" error now... how carefully do I need to configure alluxio hostname / IP settings?

Here's my general setup:

host1 (two NICs):
spark master
gluster brick

host2 (two NICs):
spark worker
alluxio master
alluxio worker
gluster brick
gluster vol mounted for alluxio

My job looks like the following:

sc.binaryFiles("alluxio://host2:19998/my.file").map(x => x._2.toArray().length).collect()


So basically I want to just print the number of bytes in a file that I know exists.


I can see the file in the Alluxio webui and it appears to correctly report that the file is on host2.  


I think there might be a problem with docker networking.  Both hosts have DNS entries, and `hostname -I` on host2 looks like:

172.18.42.1 10.0.2.10 172.17.0.1 10.43.0.11


I have tried overriding ALLUXIO_MASTER_HOSTNAME to host2's DNS name and also alluxio.worker.bind.host=host2  and these changes don't appear to help.  

The stack trace ends here, seems like a really bad place to have a null worker address ...
https://github.com/Alluxio/alluxio/blob/master/core/client/src/main/java/alluxio/client/block/AlluxioBlockStore.java#L166

Full stack:

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.1.bin)

16/06/15 17:31:43 INFO type: Alluxio client (version 1.1.0) is trying to connect with FileSystemMasterClient master @ host2/10.0.2.10:19998

16/06/15 17:31:43 INFO type: Client registered with FileSystemMasterClient master @ host2/10.0.2.10:19998

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.2.bin)

16/06/15 17:31:43 INFO FileInputFormat: Total input paths to process : 2

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.1.bin)

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.2.bin)

16/06/15 17:31:43 INFO FileInputFormat: Total input paths to process : 2

16/06/15 17:31:43 INFO CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0

16/06/15 17:31:43 INFO SparkContext: Starting job: collect at <console>:30

16/06/15 17:31:43 INFO DAGScheduler: Got job 0 (collect at <console>:30) with 1 output partitions

16/06/15 17:31:43 INFO DAGScheduler: Final stage: ResultStage 0 (collect at <console>:30)

16/06/15 17:31:43 INFO DAGScheduler: Parents of final stage: List()

16/06/15 17:31:43 INFO DAGScheduler: Missing parents: List()

16/06/15 17:31:43 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at <console>:30), which has no missing parents

16/06/15 17:31:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.7 KB, free 231.2 KB)

16/06/15 17:31:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1652.0 B, free 232.9 KB)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.0.10.246:58686 (size: 1652.0 B, free: 511.1 MB)

16/06/15 17:31:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006

16/06/15 17:31:43 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at <console>:30)

16/06/15 17:31:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/06/15 17:31:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, host2, partition 0,NODE_LOCAL, 2283 bytes)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on host2:33051 (size: 1652.0 B, free: 511.1 MB)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on host2:33051 (size: 19.4 KB, free: 511.1 MB)

16/06/15 17:31:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, host2): java.lang.RuntimeException: No available Alluxio worker found

at alluxio.client.block.AlluxioBlockStore.getOutStream(AlluxioBlockStore.java:167)

at alluxio.client.file.FileInStream.updateCacheStream(FileInStream.java:473)

at alluxio.client.file.FileInStream.updateStreams(FileInStream.java:416)

at alluxio.client.file.FileInStream.close(FileInStream.java:147)

at alluxio.hadoop.HdfsFileInputStream.close(HdfsFileInputStream.java:115)

at java.io.FilterInputStream.close(FilterInputStream.java:181)

at org.spark-project.guava.io.Closeables.close(Closeables.java:77)

at org.apache.spark.input.PortableDataStream.toArray(PortableDataStream.scala:188)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:30)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:30)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273)

at scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)



On Wednesday, June 15, 2016 at 4:24:59 PM UTC-7, [hidden email] wrote:
Thanks all!!  I'll put the jar on my spark workers and see if I can repro the NODE_LOCAL stuff shown in the tutorial.

On Wednesday, June 15, 2016 at 3:17:57 PM UTC-7, Bin Fan wrote:
Yes, that is supposed to be a Gluster volume

On Wed, Jun 15, 2016 at 2:38 PM, Paul Wais <[hidden email]> wrote:
Also, dumb question: in the Gluster docs, where it says to point Alluxio to the Gluster volume-- that's a mounted Gluster volume (via the glusterfs client) right? Or is it something else?

-Paul

On Wednesday, June 15, 2016, <paul@> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
The job definitely hits the master OK, in the alluxio logs I see:

2016-06-16 00:59:07,170 DEBUG logger.type (MountTable.java:resolve) - Resolving /my.file


On Wednesday, June 15, 2016 at 5:45:35 PM UTC-7, [hidden email] wrote:
Hi Bin,

I'm running into an odd "No available Alluxio worker found" error now... how carefully do I need to configure alluxio hostname / IP settings?

Here's my general setup:

host1 (two NICs):
spark master
gluster brick

host2 (two NICs):
spark worker
alluxio master
alluxio worker
gluster brick
gluster vol mounted for alluxio

My job looks like the following:

sc.binaryFiles("alluxio://host2:19998/my.file").map(x => x._2.toArray().length).collect()


So basically I want to just print the number of bytes in a file that I know exists.


I can see the file in the Alluxio webui and it appears to correctly report that the file is on host2.  


I think there might be a problem with docker networking.  Both hosts have DNS entries, and `hostname -I` on host2 looks like:

172.18.42.1 10.0.2.10 172.17.0.1 10.43.0.11


I have tried overriding ALLUXIO_MASTER_HOSTNAME to host2's DNS name and also alluxio.worker.bind.host=host2  and these changes don't appear to help.  

The stack trace ends here, seems like a really bad place to have a null worker address ...
<a href="https://github.com/Alluxio/alluxio/blob/master/core/client/src/main/java/alluxio/client/block/AlluxioBlockStore.java#L166" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fmaster%2Fcore%2Fclient%2Fsrc%2Fmain%2Fjava%2Falluxio%2Fclient%2Fblock%2FAlluxioBlockStore.java%23L166\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGreqmXiVxtzywzAqtHMNz77qd7MQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fmaster%2Fcore%2Fclient%2Fsrc%2Fmain%2Fjava%2Falluxio%2Fclient%2Fblock%2FAlluxioBlockStore.java%23L166\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGreqmXiVxtzywzAqtHMNz77qd7MQ&#39;;return true;">https://github.com/Alluxio/alluxio/blob/master/core/client/src/main/java/alluxio/client/block/AlluxioBlockStore.java#L166

Full stack:

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.1.bin)

16/06/15 17:31:43 INFO type: Alluxio client (version 1.1.0) is trying to connect with FileSystemMasterClient master @ host2/<a href="http://10.0.2.10:19998" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.2.10%3A19998\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHOL2nU4dB4Dbz_KHVpctZ13FXAHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.2.10%3A19998\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHOL2nU4dB4Dbz_KHVpctZ13FXAHg&#39;;return true;">10.0.2.10:19998

16/06/15 17:31:43 INFO type: Client registered with FileSystemMasterClient master @ host2/<a href="http://10.0.2.10:19998" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.2.10%3A19998\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHOL2nU4dB4Dbz_KHVpctZ13FXAHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.2.10%3A19998\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHOL2nU4dB4Dbz_KHVpctZ13FXAHg&#39;;return true;">10.0.2.10:19998

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.2.bin)

16/06/15 17:31:43 INFO FileInputFormat: Total input paths to process : 2

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.1.bin)

16/06/15 17:31:43 INFO type: getFileStatus(alluxio://host2:19998/my.file.2.bin)

16/06/15 17:31:43 INFO FileInputFormat: Total input paths to process : 2

16/06/15 17:31:43 INFO CombineFileInputFormat: DEBUG: Terminated node allocation with : CompletedNodes: 1, size left: 0

16/06/15 17:31:43 INFO SparkContext: Starting job: collect at <console>:30

16/06/15 17:31:43 INFO DAGScheduler: Got job 0 (collect at <console>:30) with 1 output partitions

16/06/15 17:31:43 INFO DAGScheduler: Final stage: ResultStage 0 (collect at <console>:30)

16/06/15 17:31:43 INFO DAGScheduler: Parents of final stage: List()

16/06/15 17:31:43 INFO DAGScheduler: Missing parents: List()

16/06/15 17:31:43 INFO DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at <console>:30), which has no missing parents

16/06/15 17:31:43 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 2.7 KB, free 231.2 KB)

16/06/15 17:31:43 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 1652.0 B, free 232.9 KB)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on <a href="http://10.0.10.246:58686" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.10.246%3A58686\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFQVGYIVgAn8CCEAMxksSbE-MDHLw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.10.246%3A58686\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFQVGYIVgAn8CCEAMxksSbE-MDHLw&#39;;return true;">10.0.10.246:58686 (size: 1652.0 B, free: 511.1 MB)

16/06/15 17:31:43 INFO SparkContext: Created broadcast 1 from broadcast at DAGScheduler.scala:1006

16/06/15 17:31:43 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at <console>:30)

16/06/15 17:31:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks

16/06/15 17:31:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, host2, partition 0,NODE_LOCAL, 2283 bytes)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on host2:33051 (size: 1652.0 B, free: 511.1 MB)

16/06/15 17:31:43 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on host2:33051 (size: 19.4 KB, free: 511.1 MB)

16/06/15 17:31:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, host2): java.lang.RuntimeException: No available Alluxio worker found

at alluxio.client.block.AlluxioBlockStore.getOutStream(AlluxioBlockStore.java:167)

at alluxio.client.file.FileInStream.updateCacheStream(FileInStream.java:473)

at alluxio.client.file.FileInStream.updateStreams(FileInStream.java:416)

at alluxio.client.file.FileInStream.close(FileInStream.java:147)

at alluxio.hadoop.HdfsFileInputStream.close(HdfsFileInputStream.java:115)

at java.io.FilterInputStream.close(FilterInputStream.java:181)

at org.spark-project.guava.io.Closeables.close(Closeables.java:77)

at org.apache.spark.input.PortableDataStream.toArray(PortableDataStream.scala:188)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:30)

at $line19.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1.apply(<console>:30)

at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)

at scala.collection.Iterator$class.foreach(Iterator.scala:727)

at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)

at scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103)

at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47)

at scala.collection.TraversableOnce$<a href="http://class.to" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fclass.to\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGWRtwQYqFt_Bgdhofh8KJ7Ag2MjA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fclass.to\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGWRtwQYqFt_Bgdhofh8KJ7Ag2MjA&#39;;return true;">class.to(TraversableOnce.scala:273)

at <a href="http://scala.collection.AbstractIterator.to" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fscala.collection.AbstractIterator.to\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGQ16VP3-UNl-zsdfOFq4xl8mI0Gw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fscala.collection.AbstractIterator.to\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGQ16VP3-UNl-zsdfOFq4xl8mI0Gw&#39;;return true;">scala.collection.AbstractIterator.to(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265)

at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157)

at scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252)

at scala.collection.AbstractIterator.toArray(Iterator.scala:1157)

at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at org.apache.spark.rdd.RDD$$anonfun$collect$1$$anonfun$12.apply(RDD.scala:927)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)

at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)

at org.apache.spark.scheduler.Task.run(Task.scala:89)

at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)



On Wednesday, June 15, 2016 at 4:24:59 PM UTC-7, [hidden email] wrote:
Thanks all!!  I'll put the jar on my spark workers and see if I can repro the NODE_LOCAL stuff shown in the tutorial.

On Wednesday, June 15, 2016 at 3:17:57 PM UTC-7, Bin Fan wrote:
Yes, that is supposed to be a Gluster volume

On Wed, Jun 15, 2016 at 2:38 PM, Paul Wais <[hidden email]> wrote:
Also, dumb question: in the Gluster docs, where it says to point Alluxio to the Gluster volume-- that's a mounted Gluster volume (via the glusterfs client) right? Or is it something else?

-Paul

On Wednesday, June 15, 2016, <paul@> wrote:
Dear List,

Let's suppose I have a Spark cluster spanning N machines.  Some of them have Gluster + Alluxio running, some of them don't.  Is Alluxio / Spark smart enough to schedule file-using tasks on the Gluster/Alluxio nodes?  What if each node has an Alluxio worker, but only *some* nodes have Gluster bricks?

Sub-question: do *all* Spark workers need to have the alluxio jar installed?  I've got the jar added to a spark-shell session, but I'm seeing "No FileSystem for scheme: alluxio" when I try to run a simple map() on alluxio-stored data (and AFAICT that map task is indeed running on the remote Spark worker, which has alluxio running too).

Cheers,
-Paul

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Gene Pang
Hi,

If you are getting the "No available Alluxio worker found", could you look at the Alluxio worker logs to see why they are not connecting with the Alluxio master?

Thanks,
Gene

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Hmm all I see in logs/worker.log is:

2016-06-16 17:20:21,439 DEBUG logger.type (Sessions.java:getTimedOutSessions) - Worker is checking all sessions' status for timeouts.


On Thu, Jun 16, 2016 at 5:57 AM, Gene Pang  wrote:
Hi,

If you are getting the "No available Alluxio worker found", could you look at the Alluxio worker logs to see why they are not connecting with the Alluxio master?

Thanks,
Gene

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
FWIW I also see the following in the Spark logs before the job crashes:

16/06/16 10:36:22 DEBUG BinaryFileRDD: Failed to use InputSplit#getLocationInfo.
java.lang.NullPointerException
  at scala.collection.mutable.ArrayOps$ofRef$.length$extension(ArrayOps.scala:114)
  at scala.collection.mutable.ArrayOps$ofRef.length(ArrayOps.scala:114)
  at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:32)
  at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
  at org.apache.spark.rdd.HadoopRDD$.convertSplitLocationInfo(HadoopRDD.scala:412)
  at org.apache.spark.rdd.NewHadoopRDD.getPreferredLocations(NewHadoopRDD.scala:233)

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L254

It looks like the splits that Spark is getting might be bad...


On Thu, Jun 16, 2016 at 10:24 AM, Paul Wais  wrote:

>
> Hmm all I see in logs/worker.log is:
>
> 2016-06-16 17:20:21,439 DEBUG logger.type (Sessions.java:getTimedOutSessions) - Worker is checking all sessions' status for timeouts.
>
>
> On Thu, Jun 16, 2016 at 5:57 AM, Gene Pang  wrote:
>>
>> Hi,
>>
>> If you are getting the "No available Alluxio worker found", could you look at the Alluxio worker logs to see why they are not connecting with the Alluxio master?
>>
>> Thanks,
>> Gene
>
>

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
ok, I got debug logging on my Spark workers and I see this:

16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
large read ops, 0 write ops, {})
16/06/17 11:46:33 DEBUG type: Init FileInStream with options
InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio

So 172.18.42.1 is not routable; it's a docker IP.  I've also been
careful to set alluxio.worker.bind.host to a hostname and/or routable
IP (neither work).

****
It looks like Alluxio is trying to get the host IP internally and
doing it incorrectly; this is a classic problem with Spark :( but they
have env vars to work around it.   Is there a surefire way to override
the worker hostname / ip ?
****

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):
https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360

Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.

On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
ok, I got debug logging on my Spark workers and I see this:

16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
large read ops, 0 write ops, {})
16/06/17 11:46:33 DEBUG type: Init FileInStream with options
InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio

So 172.18.42.1 is not routable; it's a docker IP.  I've also been
careful to set alluxio.worker.bind.host to a hostname and/or routable
IP (neither work).

****
It looks like Alluxio is trying to get the host IP internally and
doing it incorrectly; this is a classic problem with Spark :( but they
have env vars to work around it.   Is there a surefire way to override
the worker hostname / ip ?
****

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Pengfei Xuan
Hi Paul,

Alluxio supports the multihomed networks. Technically, you can specify the bind and access address for the master and workers programmably. Check the following example see if it helps:  

https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14

On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email] wrote:
Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):
<a href="https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;">https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360

Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.

On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
ok, I got debug logging on my Spark workers and I see this:

16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
large read ops, 0 write ops, {})
16/06/17 11:46:33 DEBUG type: Init FileInStream with options
InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio

So 172.18.42.1 is not routable; it's a docker IP.  I've also been
careful to set alluxio.worker.bind.host to a hostname and/or routable
IP (neither work).

****
It looks like Alluxio is trying to get the host IP internally and
doing it incorrectly; this is a classic problem with Spark :( but they
have env vars to work around it.   Is there a surefire way to override
the worker hostname / ip ?
****

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Hi Pengfei,

Right, I'm currently setting the master that way and the worker host via the alluxio.worker.bind.host setting.

But as mentioned above, this problem is happening in the *alluxio client*, not the alluxio master or worker.  The alluxio client appears to get the wrong hostname / ip (in particular, a hostname / ip that doesn't match the master or worker).  The log trace in my last email is from the Spark worker node, not the Alluxio worker node (whose logs show it appears to be getting the correct hostname).

-Paul 

On Fri, Jun 17, 2016 at 1:21 PM, Pengfei Xuan  wrote:
Hi Paul,

Alluxio supports the multihomed networks. Technically, you can specify the bind and access address for the master and workers programmably. Check the following example see if it helps:  

https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14


On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email] wrote:
Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):

Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.

On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
ok, I got debug logging on my Spark workers and I see this:

16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
large read ops, 0 write ops, {})
16/06/17 11:46:33 DEBUG type: Init FileInStream with options
InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID <a href="tel:117474066432" value="+17474066432" target="_blank">117474066432, using UFS instead. java.io.IOException: Block
<a href="tel:117474066432" value="+17474066432" target="_blank">117474066432 is not available in Alluxio
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID <a href="tel:117474066432" value="+17474066432" target="_blank">117474066432, using UFS instead. java.io.IOException: Block
<a href="tel:117474066432" value="+17474066432" target="_blank">117474066432 is not available in Alluxio

So 172.18.42.1 is not routable; it's a docker IP.  I've also been
careful to set alluxio.worker.bind.host to a hostname and/or routable
IP (neither work).

****
It looks like Alluxio is trying to get the host IP internally and
doing it incorrectly; this is a classic problem with Spark :( but they
have env vars to work around it.   Is there a surefire way to override
the worker hostname / ip ?
****

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Pengfei Xuan
I see. Thanks for clarifying!
It looks like Spark executors have an unmatched address (e.g. 172.18.42.1). Can you override it by setting `SPARK_LOCAL_HOSTNAME`. For example, 

In File:
spark/conf/spark-env.sh

Set:
SPARK_LOCAL_HOSTNAME=`hostname -A | cut -d" " -f1`



On Friday, June 17, 2016 at 4:31:15 PM UTC-4, Paul Wais wrote:
Hi Pengfei,

Right, I'm currently setting the master that way and the worker host via the alluxio.worker.bind.host setting.

But as mentioned above, this problem is happening in the *alluxio client*, not the alluxio master or worker.  The alluxio client appears to get the wrong hostname / ip (in particular, a hostname / ip that doesn't match the master or worker).  The log trace in my last email is from the Spark worker node, not the Alluxio worker node (whose logs show it appears to be getting the correct hostname).

-Paul 

On Fri, Jun 17, 2016 at 1:21 PM, Pengfei Xuan  wrote:
Hi Paul,

Alluxio supports the multihomed networks. Technically, you can specify the bind and access address for the master and workers programmably. Check the following example see if it helps:  

<a href="https://www.google.com/url?q=https%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbranch-1.0%2Fconf%2Falluxio-env.sh.template%23L14&amp;sa=D&amp;sntz=1&amp;usg=AFQjCNGHKtXEg7akRWDNEKwWc3y89qYDhA" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbranch-1.0%2Fconf%2Falluxio-env.sh.template%23L14\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHKtXEg7akRWDNEKwWc3y89qYDhA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbranch-1.0%2Fconf%2Falluxio-env.sh.template%23L14\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHKtXEg7akRWDNEKwWc3y89qYDhA&#39;;return true;">https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14


On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email] wrote:
Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):
<a href="https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;">https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360

Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.

On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
ok, I got debug logging on my Spark workers and I see this:

16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
Configuration: core-default.xml, core-site.xml, mapred-default.xml,
mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
large read ops, 0 write ops, {})
16/06/17 11:46:33 DEBUG type: Init FileInStream with options
InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio
16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
with ID 117474066432, using UFS instead. java.io.IOException: Block
117474066432 is not available in Alluxio

So 172.18.42.1 is not routable; it's a docker IP.  I've also been
careful to set alluxio.worker.bind.host to a hostname and/or routable
IP (neither work).

****
It looks like Alluxio is trying to get the host IP internally and
doing it incorrectly; this is a classic problem with Spark :( but they
have env vars to work around it.   Is there a surefire way to override
the worker hostname / ip ?
****

--
You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="97c37AVFAQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Hmm, so I tried out this patch and set ALLUXIO_HOST_IP to exactly what
the Spark worker is using:

https://gist.github.com/cpwais/611dff495f2b4bd8ee1813acfe2d122f


But I still get the same error:

https://gist.github.com/cpwais/b83e1d8d79187a5505369bf6c8cad6aa





On Fri, Jun 17, 2016 at 1:40 PM, Pengfei Xuan

>
> I see. Thanks for clarifying!
> It looks like Spark executors have an unmatched address (e.g. 172.18.42.1). Can you override it by setting `SPARK_LOCAL_HOSTNAME`. For example,
>
> In File:
> spark/conf/spark-env.sh
>
> Set:
> SPARK_LOCAL_HOSTNAME=`hostname -A | cut -d" " -f1`
>
>
>
> On Friday, June 17, 2016 at 4:31:15 PM UTC-4, Paul Wais wrote:
>>
>> Hi Pengfei,
>>
>> Right, I'm currently setting the master that way and the worker host via the alluxio.worker.bind.host setting.
>>
>> But as mentioned above, this problem is happening in the *alluxio client*, not the alluxio master or worker.  The alluxio client appears to get the wrong hostname / ip (in particular, a hostname / ip that doesn't match the master or worker).  The log trace in my last email is from the Spark worker node, not the Alluxio worker node (whose logs show it appears to be getting the correct hostname).
>>
>> -Paul
>>
>> On Fri, Jun 17, 2016 at 1:21 PM, Pengfei Xuan  wrote:
>>>
>>> Hi Paul,
>>>
>>> Alluxio supports the multihomed networks. Technically, you can specify the bind and access address for the master and workers programmably. Check the following example see if it helps:
>>>
>>> https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14
>>>
>>>
>>> On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email] wrote:
>>>>
>>>> Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):
>>>> https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360
>>>>
>>>> Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.
>>>>
>>>> On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
>>>>>
>>>>> ok, I got debug logging on my Spark workers and I see this:
>>>>>
>>>>> 16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
>>>>> 16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
>>>>> Configuration: core-default.xml, core-site.xml, mapred-default.xml,
>>>>> mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
>>>>> hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
>>>>> large read ops, 0 write ops, {})
>>>>> 16/06/17 11:46:33 DEBUG type: Init FileInStream with options
>>>>> InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
>>>>> readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
>>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>>>>> 117474066432 is not available in Alluxio
>>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>>>>> 117474066432 is not available in Alluxio
>>>>>
>>>>> So 172.18.42.1 is not routable; it's a docker IP.  I've also been
>>>>> careful to set alluxio.worker.bind.host to a hostname and/or routable
>>>>> IP (neither work).
>>>>>
>>>>> ****
>>>>> It looks like Alluxio is trying to get the host IP internally and
>>>>> doing it incorrectly; this is a classic problem with Spark :( but they
>>>>> have env vars to work around it.   Is there a surefire way to override
>>>>> the worker hostname / ip ?
>>>>> ****
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
>>> To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to [hidden email].
>>> For more options, visit https://groups.google.com/d/optout.
>>
>>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to [hidden email].
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Pengfei Xuan
Could you check if you can get a corresponding hostname (FQDN) for the IP address 10.0.2.243? You can try the following command to see result:

host 10.0.2.243

On Monday, June 20, 2016 at 4:54:46 PM UTC-4, Paul Wais wrote:
Hmm, so I tried out this patch and set ALLUXIO_HOST_IP to exactly what
the Spark worker is using:

<a href="https://gist.github.com/cpwais/611dff495f2b4bd8ee1813acfe2d122f" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgist.github.com%2Fcpwais%2F611dff495f2b4bd8ee1813acfe2d122f\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSOTViz5FBn79sWijlKhv2VQspQQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgist.github.com%2Fcpwais%2F611dff495f2b4bd8ee1813acfe2d122f\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSOTViz5FBn79sWijlKhv2VQspQQ&#39;;return true;">https://gist.github.com/cpwais/611dff495f2b4bd8ee1813acfe2d122f


But I still get the same error:

<a href="https://gist.github.com/cpwais/b83e1d8d79187a5505369bf6c8cad6aa" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgist.github.com%2Fcpwais%2Fb83e1d8d79187a5505369bf6c8cad6aa\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGaUshdPl7vToPaH3gkwb1WUybCSQ&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgist.github.com%2Fcpwais%2Fb83e1d8d79187a5505369bf6c8cad6aa\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGaUshdPl7vToPaH3gkwb1WUybCSQ&#39;;return true;">https://gist.github.com/cpwais/b83e1d8d79187a5505369bf6c8cad6aa





On Fri, Jun 17, 2016 at 1:40 PM, Pengfei Xuan

>
> I see. Thanks for clarifying!
> It looks like Spark executors have an unmatched address (e.g. 172.18.42.1). Can you override it by setting `SPARK_LOCAL_HOSTNAME`. For example,
>
> In File:
> spark/conf/spark-env.sh
>
> Set:
> SPARK_LOCAL_HOSTNAME=`hostname -A | cut -d" " -f1`
>
>
>
> On Friday, June 17, 2016 at 4:31:15 PM UTC-4, Paul Wais wrote:
>>
>> Hi Pengfei,
>>
>> Right, I'm currently setting the master that way and the worker host via the alluxio.worker.bind.host setting.
>>
>> But as mentioned above, this problem is happening in the *alluxio client*, not the alluxio master or worker.  The alluxio client appears to get the wrong hostname / ip (in particular, a hostname / ip that doesn't match the master or worker).  The log trace in my last email is from the Spark worker node, not the Alluxio worker node (whose logs show it appears to be getting the correct hostname).
>>
>> -Paul
>>
>> On Fri, Jun 17, 2016 at 1:21 PM, Pengfei Xuan  wrote:
>>>
>>> Hi Paul,
>>>
>>> Alluxio supports the multihomed networks. Technically, you can specify the bind and access address for the master and workers programmably. Check the following example see if it helps:
>>>
>>> <a href="https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbranch-1.0%2Fconf%2Falluxio-env.sh.template%23L14\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHKtXEg7akRWDNEKwWc3y89qYDhA&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbranch-1.0%2Fconf%2Falluxio-env.sh.template%23L14\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGHKtXEg7akRWDNEKwWc3y89qYDhA&#39;;return true;">https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14
>>>
>>>
>>> On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email] wrote:
>>>>
>>>> Yea, so the problem is in here on the client side (i.e. the alluxio jar that the spark worker is using):
>>>> <a href="https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fblob%2Fbaa79fbb85519ac79725b4e8b9589145f49c05a3%2Fcore%2Fcommon%2Fsrc%2Fmain%2Fjava%2Falluxio%2Futil%2Fnetwork%2FNetworkAddressUtils.java%23L360\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFsuxm_NygeWRRkpmP0xAMsvvVGHw&#39;;return true;">https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360
>>>>
>>>> Alluxio is fine taking a docker network interface that doesn't actually work.  So, Spark allows using an env var to override this sort of behavior.  I think Alluxio will need a similar feature.
>>>>
>>>> On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
>>>>>
>>>>> ok, I got debug logging on my Spark workers and I see this:
>>>>>
>>>>> 16/06/17 11:46:33 INFO type: open(alluxio://alluxio-master:19998/my.file, 65536)
>>>>> 16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
>>>>> Configuration: core-default.xml, core-site.xml, mapred-default.xml,
>>>>> mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
>>>>> hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
>>>>> large read ops, 0 write ops, {})
>>>>> 16/06/17 11:46:33 DEBUG type: Init FileInStream with options
>>>>> InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
>>>>> readType=CACHE, cachePartiallyReadBlock=true, seekBufferSize=1048576}
>>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>>>>> 117474066432 is not available in Alluxio
>>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>>>>> 117474066432 is not available in Alluxio
>>>>>
>>>>> So 172.18.42.1 is not routable; it's a docker IP.  I've also been
>>>>> careful to set alluxio.worker.bind.host to a hostname and/or routable
>>>>> IP (neither work).
>>>>>
>>>>> ****
>>>>> It looks like Alluxio is trying to get the host IP internally and
>>>>> doing it incorrectly; this is a classic problem with Spark :( but they
>>>>> have env vars to work around it.   Is there a surefire way to override
>>>>> the worker hostname / ip ?
>>>>> ****
>>>
>>> --
>>> You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
>>> To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
>>> To unsubscribe from this group and all its topics, send an email to alluxio-user...@googlegroups.com.
>>> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.
>>
>>
> --
> You received this message because you are subscribed to a topic in the Google Groups "Alluxio Users" group.
> To unsubscribe from this topic, visit <a href="https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe&#39;;return true;">https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="ZVyNBwwyAgAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
> For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
Yep, host works for both the IP and fqdn

On Mon, Jun 20, 2016 at 9:04 PM, wrote:

> Could you check if you can get a corresponding hostname (FQDN) for the IP
> address 10.0.2.243? You can try the following command to see result:
>
> host 10.0.2.243
>
>
> On Monday, June 20, 2016 at 4:54:46 PM UTC-4, Paul Wais wrote:
>>
>> Hmm, so I tried out this patch and set ALLUXIO_HOST_IP to exactly what
>> the Spark worker is using:
>>
>> https://gist.github.com/cpwais/611dff495f2b4bd8ee1813acfe2d122f
>>
>>
>> But I still get the same error:
>>
>> https://gist.github.com/cpwais/b83e1d8d79187a5505369bf6c8cad6aa
>>
>>
>>
>>
>>
>> On Fri, Jun 17, 2016 at 1:40 PM, Pengfei Xuan
>> >
>> > I see. Thanks for clarifying!
>> > It looks like Spark executors have an unmatched address (e.g.
>> > 172.18.42.1). Can you override it by setting `SPARK_LOCAL_HOSTNAME`. For
>> > example,
>> >
>> > In File:
>> > spark/conf/spark-env.sh
>> >
>> > Set:
>> > SPARK_LOCAL_HOSTNAME=`hostname -A | cut -d" " -f1`
>> >
>> >
>> >
>> > On Friday, June 17, 2016 at 4:31:15 PM UTC-4, Paul Wais wrote:
>> >>
>> >> Hi Pengfei,
>> >>
>> >> Right, I'm currently setting the master that way and the worker host
>> >> via the alluxio.worker.bind.host setting.
>> >>
>> >> But as mentioned above, this problem is happening in the *alluxio
>> >> client*, not the alluxio master or worker.  The alluxio client appears to
>> >> get the wrong hostname / ip (in particular, a hostname / ip that doesn't
>> >> match the master or worker).  The log trace in my last email is from the
>> >> Spark worker node, not the Alluxio worker node (whose logs show it appears
>> >> to be getting the correct hostname).
>> >>
>> >> -Paul
>> >>
>> >> On Fri, Jun 17, 2016 at 1:21 PM, Pengfei Xuan  wrote:
>> >>>
>> >>> Hi Paul,
>> >>>
>> >>> Alluxio supports the multihomed networks. Technically, you can specify
>> >>> the bind and access address for the master and workers programmably. Check
>> >>> the following example see if it helps:
>> >>>
>> >>>
>> >>> https://github.com/Alluxio/alluxio/blob/branch-1.0/conf/alluxio-env.sh.template#L14
>> >>>
>> >>>
>> >>> On Friday, June 17, 2016 at 4:06:02 PM UTC-4, [hidden email]
>> >>> wrote:
>> >>>>
>> >>>> Yea, so the problem is in here on the client side (i.e. the alluxio
>> >>>> jar that the spark worker is using):
>> >>>>
>> >>>> https://github.com/Alluxio/alluxio/blob/baa79fbb85519ac79725b4e8b9589145f49c05a3/core/common/src/main/java/alluxio/util/network/NetworkAddressUtils.java#L360
>> >>>>
>> >>>> Alluxio is fine taking a docker network interface that doesn't
>> >>>> actually work.  So, Spark allows using an env var to override this sort of
>> >>>> behavior.  I think Alluxio will need a similar feature.
>> >>>>
>> >>>> On Friday, June 17, 2016 at 11:52:42 AM UTC-7, Paul Wais wrote:
>> >>>>>
>> >>>>> ok, I got debug logging on my Spark workers and I see this:
>> >>>>>
>> >>>>> 16/06/17 11:46:33 INFO type:
>> >>>>> open(alluxio://alluxio-master:19998/my.file, 65536)
>> >>>>> 16/06/17 11:46:33 DEBUG type: HdfsFileInputStream(/my.file,
>> >>>>> Configuration: core-default.xml, core-site.xml, mapred-default.xml,
>> >>>>> mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml,
>> >>>>> hdfs-site.xml, 65536, 0 bytes read, 0 bytes written, 3 read ops, 0
>> >>>>> large read ops, 0 write ops, {})
>> >>>>> 16/06/17 11:46:33 DEBUG type: Init FileInStream with options
>> >>>>>
>> >>>>> InStreamOptions{locationPolicy=LocalFirstPolicy{localHostName=172.18.42.1},
>> >>>>> readType=CACHE, cachePartiallyReadBlock=true,
>> >>>>> seekBufferSize=1048576}
>> >>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>> >>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>> >>>>> 117474066432 is not available in Alluxio
>> >>>>> 16/06/17 11:46:33 DEBUG type: Failed to get BlockInStream for block
>> >>>>> with ID 117474066432, using UFS instead. java.io.IOException: Block
>> >>>>> 117474066432 is not available in Alluxio
>> >>>>>
>> >>>>> So 172.18.42.1 is not routable; it's a docker IP.  I've also been
>> >>>>> careful to set alluxio.worker.bind.host to a hostname and/or
>> >>>>> routable
>> >>>>> IP (neither work).
>> >>>>>
>> >>>>> ****
>> >>>>> It looks like Alluxio is trying to get the host IP internally and
>> >>>>> doing it incorrectly; this is a classic problem with Spark :( but
>> >>>>> they
>> >>>>> have env vars to work around it.   Is there a surefire way to
>> >>>>> override
>> >>>>> the worker hostname / ip ?
>> >>>>> ****
>> >>>
>> >>> --
>> >>> You received this message because you are subscribed to a topic in the
>> >>> Google Groups "Alluxio Users" group.
>> >>> To unsubscribe from this topic, visit
>> >>> https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
>> >>> To unsubscribe from this group and all its topics, send an email to
>> >>> [hidden email].
>> >>> For more options, visit https://groups.google.com/d/optout.
>> >>
>> >>
>> > --
>> > You received this message because you are subscribed to a topic in the
>> > Google Groups "Alluxio Users" group.
>> > To unsubscribe from this topic, visit
>> > https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
>> > To unsubscribe from this group and all its topics, send an email to
>> > [hidden email].
>> > For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to a topic in the
> Google Groups "Alluxio Users" group.
> To unsubscribe from this topic, visit
> https://groups.google.com/d/topic/alluxio-users/rfEHbjb0ovU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to
> [hidden email].
> For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

Gene Pang
Hi Paul,

Were you able to resolve your issue?

Thanks,
Gene

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: heterogeneous cluster: what if all Spark nodes don't have Alluxio workers ?

paul
I was not, unfortunately, and I ran out of time for this project.  I
hypothesize that I may have had a bad line in /etc/hosts; however the
ALLUXIO_HOST_IP patch is really important in order to avoid Alluxio
choosing a non-working (docker) network interface.  I'll probably have
time to return to this in the next month or two, though.  I would
strongly suggest Alluxio adopt the Spark-style IP override, either
thru the suggested ALLUXIO_HOST_IP or some other way.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
12