issue with running spark-submit

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

issue with running spark-submit

Deema Yatsyuk
I did the following
 install Ambari HDP and then

1. setup ssh password-less access to all servers in a cluster from master node/nodes
2. create /opt/alluxio directory
3. download alluxio from http://www.alluxio.org/download and unarchive into to /opt/alluxio
3.1. mkdir -p /etc/alluxio
4. cp /opt/alluxio/conf/alluxio-site.properties.template /etc/alluxio/alluxio-site.properties 
content of a file

4.1. set in /etc/alluxio/alluxio-site.properties master node and path to hdfs folder
alluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
alluxio.underfs.address=hdfs://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net/alluxio
alluxio.security.authorization.permission.enabled=false

5. copy core-site.xml and hdfs-site.xml from hadoop distributive into 
/opt/alluxio/conf
6. edit /opt/alluxio/conf/masters
set master node name
7. edit /opt/alluxio/conf/workers
set list of worker nodes
8. scp /etc/alluxio/alluxio-site.properties and /opt/alluxio/ on all nodes in a cluster
9. format alluxio
/opt/alluxio/bin/alluxio format
10. run master node 
/opt/alluxio/bin/alluxio-start.sh master
11. start workers
/opt/alluxio/bin/alluxio-start.sh workers
12. check if cluster is operated
http://ip-of-master-node:19999/workers
13. run alluxio internal tests
/opt/alluxio/bin/alluxio runTests
they are fine

but when i run opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn

they are failed 

driver log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLog
gerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/03 21:50:32 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 3811@wn2-nsd-hd
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for TERM
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for HUP
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for INT
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissi
ons: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/03 21:50:33 INFO TransportClientFactory: Successfully created connection to /10.0.0.6:44337 after 53 ms (0 ms spent in bootstraps)
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissi
ons: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/03 21:50:34 INFO TransportClientFactory: Successfully created connection to /10.0.0.6:44337 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:34 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536006861159_0028/blockmgr-1ade914e
-66e0-4eed-a16c-c91732a33d56
18/09/03 21:50:34 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.6:44337
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/03 21:50:34 INFO Executor: Starting executor ID 1 on host wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
18/09/03 21:50:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34847.
18/09/03 21:50:34 INFO NettyBlockTransferService: Server created on wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847
18/09/03 21:50:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/03 21:50:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 3
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 7
18/09/03 21:50:35 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
18/09/03 21:50:35 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
18/09/03 21:50:35 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/03 21:50:35 INFO TransportClientFactory: Successfully created connection to /10.0.0.6:34251 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.3 KB, free 5.2 GB)
18/09/03 21:50:36 INFO TorrentBroadcast: Reading broadcast variable 0 took 128 ms
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KB, free 5.2 GB)
18/09/03 21:50:36 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1702 bytes result sent to driver
18/09/03 21:50:36 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 2430 bytes result sent to driver
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.activeTasks, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.completeTasks, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.currentPool_size, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.maxPool_size, value=2147483647
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Old-Generation.count, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Old-Generation.time, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Young-Generation.count, value=4
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Young-Generation.time, value=58
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.committed, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.init, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.max, value=9663676416
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.usage, value=0.011385394467247857
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.used, value=112121920
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.committed, value=50593792
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.max, value=-1
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.usage, value=-4.957636E7
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.used, value=49584424
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.committed, value=7995392
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.max, value=251658240
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.usage, value=0.031464640299479166
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.used, value=7918336
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.committed, value=5111808
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.init, value=0
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from 10.0.0.6:44337 disconnected during shutdown
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from 10.0.0.6:44337 disconnected during shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.max, value=1073741824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.usage, value=0.004624105989933014
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.used, value=4965096
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.committed, value=557842432
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.init, value=50331648
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.usage, value=0.041353383458646614
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.used, value=23068672
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.committed, value=341835776
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.init, value=874512384
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.max, value=9663676416
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.usage, value=0.006828102800581191
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.used, value=65984576
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.committed, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.usage, value=1.0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.used, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.committed, value=37486592
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.usage, value=0.980258754916958
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.used, value=36746560
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.committed, value=975503360
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.init, value=927399936
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.max, value=9663676415
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.used, value=163924840
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.fileCacheHits, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.filesDiscovered, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.hiveClientCalls, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.parallelListingJobCount, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.partitionsFetched, count=0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.compilationTime, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0, p
75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.generatedClassSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0
, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.generatedMethodSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.
0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.sourceCodeSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0, p7
5=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO MemoryStore: MemoryStore cleared
18/09/03 21:50:37 INFO BlockManager: BlockManager stopped
18/09/03 21:50:37 INFO ShutdownHookManager: Shutdown hook called



executor log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-
bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j
/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-
1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1
.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-asse
mbly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/03 21:50:32 INFO CoarseGrainedExecutorBackend: Started daemon with process
 name: 3811@wn2-nsd-hd
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for TERM
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for HUP
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for INT
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups wit
h view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); g
roups with modify permissions: Set()
18/09/03 21:50:33 INFO TransportClientFactory: Successfully created connection t
o /10.0.0.6:44337 after 53 ms (0 ms spent in bootstraps)
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups wit
h view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); g
roups with modify permissions: Set()
18/09/03 21:50:34 INFO TransportClientFactory: Successfully created connection t
o /10.0.0.6:44337 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:34 INFO DiskBlockManager: Created local directory at /mnt/resourc
e/hadoop/yarn/local/usercache/sshuser/appcache/application_1536006861159_0028/bl
ockmgr-1ade914e-66e0-4eed-a16c-c91732a33d56
18/09/03 21:50:34 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark
://CoarseGrainedScheduler@10.0.0.6:44337
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Successfully registered wit
h driver
18/09/03 21:50:34 INFO Executor: Starting executor ID 1 on host wn2-nsd-hd.yawj1
ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
18/09/03 21:50:34 INFO Utils: Successfully started service 'org.apache.spark.net
work.netty.NettyBlockTransferService' on port 34847.
18/09/03 21:50:34 INFO NettyBlockTransferService: Server created on wn2-nsd-hd.y
awj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847
18/09/03 21:50:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockR
eplicationPolicy for block replication policy
18/09/03 21:50:34 INFO BlockManagerMaster: Registering BlockManager BlockManager
Id(1, wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, Non
e)
18/09/03 21:50:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerI
d(1, wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None
)
18/09/03 21:50:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(1,
 wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 3
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 7
18/09/03 21:50:35 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
18/09/03 21:50:35 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
18/09/03 21:50:35 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/03 21:50:35 INFO TransportClientFactory: Successfully created connection t
o /10.0.0.6:34251 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
memory (estimated size 2.3 KB, free 5.2 GB)
18/09/03 21:50:36 INFO TorrentBroadcast: Reading broadcast variable 0 took 128 m
s
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0 stored as values in memory
 (estimated size 4.0 KB, free 5.2 GB)
18/09/03 21:50:36 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1702 by
tes result sent to driver
18/09/03 21:50:36 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 2430 by
tes result sent to driver
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.activeTasks, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.completeTasks, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.currentPool_size, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.maxPool_size, value=2147483647
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Old-Generation.count, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Old-Generation.time, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Young-Generation.count, value=4
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Young-Generation.time, value=58
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.committed, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.init, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.max, value=9663676416
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.usage, value=0.011385394467247857
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.used, value=112121920
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.committed, value=50593792
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.max, value=-1
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.usage, value=-4.957636E7
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.used, value=49584424
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.committed, value=7995392
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.max, value=251658240
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.usage, value=0.031464640299479166
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.used, value=7918336
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.committed, value=5111808
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.init, value=0
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from 10.0.0.6:44337
disconnected during shutdown
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from 10.0.0.6:44337
disconnected during shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.max, value=1073741824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.usage, value=0.004624105989933014
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.used, value=4965096
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.committed, value=557842432
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.init, value=50331648
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.usage, value=0.041353383458646614
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.used, value=23068672
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.committed, value=341835776
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.init, value=874512384
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.max, value=9663676416
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.usage, value=0.006828102800581191
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.used, value=65984576
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.committed, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.usage, value=1.0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.used, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.committed, value=37486592
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.usage, value=0.980258754916958
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.used, value=36746560
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.committed, value=975503360
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.init, value=927399936
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.max, value=9663676415
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.used, value=163924840
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.fileCacheHits, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.filesDiscovered, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.hiveClientCalls, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.parallelListingJobCount, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.partitionsFetched, count=0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.compilationTime, count=0, min=0, max=0, mean=0.0, stddev=0.0
, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.generatedClassSize, count=0, min=0, max=0, mean=0.0, stddev=
0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.generatedMethodSize, count=0, min=0, max=0, mean=0.0, stddev
=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.sourceCodeSize, count=0, min=0, max=0, mean=0.0, stddev=0.0,
 median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO MemoryStore: MemoryStore cleared
18/09/03 21:50:37 INFO BlockManager: BlockManager stopped
18/09/03 21:50:37 INFO ShutdownHookManager: Shutdown hook called


The main issue when Im trying to run spark-submit
spark-submit --conf spark.logConf=true --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///alluxio/avro avro

/opt/alluxio/client/alluxio-1.8.0-client.jar added in a extra class path for driver and executor also

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Any help will be awesome 
Thanks

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
I followed post https://groups.google.com/forum/?fromgroups#!searchin/alluxio-users/URI$20hostname$20must$20not$20be$20null%7Csort:date/alluxio-users/45aUj37huwg/cUDIPPhYAAAJ
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue


On Tuesday, September 4, 2018 at 12:02:55 PM UTC+3, Deema Yatsyuk wrote:
I did the following
 install Ambari HDP and then

1. setup ssh password-less access to all servers in a cluster from master node/nodes
2. create /opt/alluxio directory
3. download alluxio from <a href="http://www.alluxio.org/download" rel="nofollow" style="color:rgb(0,82,204);font-family:-apple-system,system-ui,&quot;Segoe UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;background-color:rgb(244,245,247)" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.alluxio.org%2Fdownload\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTkzyntKz5ExJGo3pI9rUkfQdtdw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.alluxio.org%2Fdownload\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHTkzyntKz5ExJGo3pI9rUkfQdtdw&#39;;return true;">http://www.alluxio.org/download and unarchive into to /opt/alluxio
3.1. mkdir -p /etc/alluxio
4. cp /opt/alluxio/conf/alluxio-site.properties.template /etc/alluxio/alluxio-site.properties 
content of a file

4.1. set in /etc/alluxio/alluxio-site.properties master node and path to hdfs folder
alluxio.master.hostname=<a href="http://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fhn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFvMvF1pf9l6KqFhY9tWpNowlN9pw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fhn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNFvMvF1pf9l6KqFhY9tWpNowlN9pw&#39;;return true;">hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
alluxio.underfs.address=hdfs://<a href="http://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net/alluxio" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fhn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%2Falluxio\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGFL9m6cWuaa33sQH4lUsf3upVxBA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fhn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%2Falluxio\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGFL9m6cWuaa33sQH4lUsf3upVxBA&#39;;return true;">hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net/alluxio
alluxio.security.authorization.permission.enabled=false

5. copy core-site.xml and hdfs-site.xml from hadoop distributive into 
/opt/alluxio/conf
6. edit /opt/alluxio/conf/masters
set master node name
7. edit /opt/alluxio/conf/workers
set list of worker nodes
8. scp /etc/alluxio/alluxio-site.properties and /opt/alluxio/ on all nodes in a cluster
9. format alluxio
/opt/alluxio/bin/alluxio format
10. run master node 
/opt/alluxio/bin/alluxio-start.sh master
11. start workers
/opt/alluxio/bin/alluxio-start.sh workers
12. check if cluster is operated
<a href="http://ip-of-master-node:19999/workers" rel="nofollow" style="color:rgb(0,82,204);font-family:-apple-system,system-ui,&quot;Segoe UI&quot;,Roboto,Oxygen,Ubuntu,&quot;Fira Sans&quot;,&quot;Droid Sans&quot;,&quot;Helvetica Neue&quot;,sans-serif;font-size:14px;background-color:rgb(244,245,247)" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fip-of-master-node%3A19999%2Fworkers\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNErwCPZ53trIijMhi2BZxCJwub-Ug&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fip-of-master-node%3A19999%2Fworkers\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNErwCPZ53trIijMhi2BZxCJwub-Ug&#39;;return true;">http://ip-of-master-node:19999/workers
13. run alluxio internal tests
/opt/alluxio/bin/alluxio runTests
they are fine

but when i run opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn

they are failed 

driver log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLog
gerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <a href="http://www.slf4j.org/codes.html#multiple_bindings" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.slf4j.org%2Fcodes.html%23multiple_bindings\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHviJfiUvjiSFk2iaVebG-XCN1w-Q&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.slf4j.org%2Fcodes.html%23multiple_bindings\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHviJfiUvjiSFk2iaVebG-XCN1w-Q&#39;;return true;">http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/03 21:50:32 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 3811@wn2-nsd-hd
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for TERM
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for HUP
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for INT
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissi
ons: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/03 21:50:33 INFO TransportClientFactory: Successfully created connection to /<a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 after 53 ms (0 ms spent in bootstraps)
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissi
ons: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/03 21:50:34 INFO TransportClientFactory: Successfully created connection to /<a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:34 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536006861159_0028/blockmgr-1ade914e
-66e0-4eed-a16c-c91732a33d56
18/09/03 21:50:34 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://<a href="http://CoarseGrainedScheduler@10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FCoarseGrainedScheduler%4010.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEMNK4p0UuRJAMgNFpyUtv-NDqwrg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FCoarseGrainedScheduler%4010.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEMNK4p0UuRJAMgNFpyUtv-NDqwrg&#39;;return true;">CoarseGrainedScheduler@10.0.0.6:44337
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/03 21:50:34 INFO Executor: Starting executor ID 1 on host <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
18/09/03 21:50:34 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34847.
18/09/03 21:50:34 INFO NettyBlockTransferService: Server created on <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%3A34847\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHxapY-zO1q8KFJxDTNwZxjNzi_Cw&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%3A34847\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHxapY-zO1q8KFJxDTNwZxjNzi_Cw&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847
18/09/03 21:50:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/03 21:50:34 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(1, <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(1, <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(1, <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 3
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 7
18/09/03 21:50:35 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
18/09/03 21:50:35 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
18/09/03 21:50:35 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/03 21:50:35 INFO TransportClientFactory: Successfully created connection to /<a href="http://10.0.0.6:34251" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A34251\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSPXFwTs3G9RUdBEMlK4cHLMnVwA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A34251\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSPXFwTs3G9RUdBEMlK4cHLMnVwA&#39;;return true;">10.0.0.6:34251 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 2.3 KB, free 5.2 GB)
18/09/03 21:50:36 INFO TorrentBroadcast: Reading broadcast variable 0 took 128 ms
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 4.0 KB, free 5.2 GB)
18/09/03 21:50:36 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1702 bytes result sent to driver
18/09/03 21:50:36 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 2430 bytes result sent to driver
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.file.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.filesystem.hdfs.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.activeTasks, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.completeTasks, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.currentPool_size, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.executor.threadpool.maxPool_size, value=2147483647
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Old-Generation.count, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Old-Generation.time, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Young-Generation.count, value=4
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.G1-Young-Generation.time, value=58
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.committed, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.init, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.max, value=9663676416
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.usage, value=0.011385394467247857
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.heap.used, value=112121920
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.committed, value=50593792
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.max, value=-1
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.usage, value=-4.957636E7
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.non-heap.used, value=49584424
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.committed, value=7995392
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.max, value=251658240
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.usage, value=0.031464640299479166
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Code-Cache.used, value=7918336
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.committed, value=5111808
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.init, value=0
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from <a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 disconnected during shutdown
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from <a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 disconnected during shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.max, value=1073741824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.usage, value=0.004624105989933014
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Compressed-Class-Space.used, value=4965096
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.committed, value=557842432
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.init, value=50331648
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.usage, value=0.041353383458646614
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Eden-Space.used, value=23068672
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.committed, value=341835776
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.init, value=874512384
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.max, value=9663676416
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.usage, value=0.006828102800581191
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Old-Gen.used, value=65984576
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.committed, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.usage, value=1.0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.G1-Survivor-Space.used, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.committed, value=37486592
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.usage, value=0.980258754916958
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.pools.Metaspace.used, value=36746560
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.committed, value=975503360
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.init, value=927399936
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.max, value=9663676415
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.1.jvm.total.used, value=163924840
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.fileCacheHits, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.filesDiscovered, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.hiveClientCalls, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.parallelListingJobCount, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_0028.1.HiveExternalCatalog.partitionsFetched, count=0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.compilationTime, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0, p
75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.generatedClassSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0
, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.generatedMethodSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.
0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0028.1.CodeGenerator.sourceCodeSize, count=0, min=0, max=0, mean=0.0, stddev=0.0, median=0.0, p7
5=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO MemoryStore: MemoryStore cleared
18/09/03 21:50:37 INFO BlockManager: BlockManager stopped
18/09/03 21:50:37 INFO ShutdownHookManager: Shutdown hook called



executor log
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-
bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j
/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-
1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1
.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-asse
mbly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See <a href="http://www.slf4j.org/codes.html#multiple_bindings" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.slf4j.org%2Fcodes.html%23multiple_bindings\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHviJfiUvjiSFk2iaVebG-XCN1w-Q&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwww.slf4j.org%2Fcodes.html%23multiple_bindings\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNHviJfiUvjiSFk2iaVebG-XCN1w-Q&#39;;return true;">http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/03 21:50:32 INFO CoarseGrainedExecutorBackend: Started daemon with process
 name: 3811@wn2-nsd-hd
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for TERM
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for HUP
18/09/03 21:50:32 INFO SignalUtils: Registered signal handler for INT
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups wit
h view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); g
roups with modify permissions: Set()
18/09/03 21:50:33 INFO TransportClientFactory: Successfully created connection t
o /<a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 after 53 ms (0 ms spent in bootstraps)
18/09/03 21:50:33 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/03 21:50:33 INFO SecurityManager: Changing view acls groups to:
18/09/03 21:50:33 INFO SecurityManager: Changing modify acls groups to:
18/09/03 21:50:33 INFO SecurityManager: SecurityManager: authentication disabled
; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups wit
h view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); g
roups with modify permissions: Set()
18/09/03 21:50:34 INFO TransportClientFactory: Successfully created connection t
o /<a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:34 INFO DiskBlockManager: Created local directory at /mnt/resourc
e/hadoop/yarn/local/usercache/sshuser/appcache/application_1536006861159_0028/bl
ockmgr-1ade914e-66e0-4eed-a16c-c91732a33d56
18/09/03 21:50:34 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark
://<a href="http://CoarseGrainedScheduler@10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FCoarseGrainedScheduler%4010.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEMNK4p0UuRJAMgNFpyUtv-NDqwrg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2FCoarseGrainedScheduler%4010.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEMNK4p0UuRJAMgNFpyUtv-NDqwrg&#39;;return true;">CoarseGrainedScheduler@10.0.0.6:44337
18/09/03 21:50:34 INFO CoarseGrainedExecutorBackend: Successfully registered wit
h driver
18/09/03 21:50:34 INFO Executor: Starting executor ID 1 on host wn2-nsd-hd.yawj1
<a href="http://ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Few5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE6XZcUvdIQub2Xjd9aNU0e1Aco8w&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Few5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNE6XZcUvdIQub2Xjd9aNU0e1Aco8w&#39;;return true;">ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net
18/09/03 21:50:34 INFO Utils: Successfully started service '<a href="http://org.apache.spark.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Forg.apache.spark.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGWlV9DORfzyh0UcdQ6SL4mX5H3Q&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Forg.apache.spark.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEGWlV9DORfzyh0UcdQ6SL4mX5H3Q&#39;;return true;">org.apache.spark.net
work.netty.NettyBlockTransferService' on port 34847.
18/09/03 21:50:34 INFO NettyBlockTransferService: Server created on wn2-nsd-hd.y
<a href="http://awj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%3A34847\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGu72odRbCUekykzOfaXdxD9INaSA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net%3A34847\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGu72odRbCUekykzOfaXdxD9INaSA&#39;;return true;">awj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:34847
18/09/03 21:50:34 INFO BlockManager: Using org.apache.spark.storage.RandomBlockR
eplicationPolicy for block replication policy
18/09/03 21:50:34 INFO BlockManagerMaster: Registering BlockManager BlockManager
Id(1, <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, Non
e)
18/09/03 21:50:34 INFO BlockManagerMaster: Registered BlockManager BlockManagerI
d(1, <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None
)
18/09/03 21:50:34 INFO BlockManager: Initialized BlockManager: BlockManagerId(1,
 <a href="http://wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Fwn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNH3n5FivH73SHpoxtWH9rGtYp3VHg&#39;;return true;">wn2-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net, 34847, None)
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 3
18/09/03 21:50:35 INFO CoarseGrainedExecutorBackend: Got assigned task 7
18/09/03 21:50:35 INFO Executor: Running task 3.0 in stage 0.0 (TID 3)
18/09/03 21:50:35 INFO Executor: Running task 7.0 in stage 0.0 (TID 7)
18/09/03 21:50:35 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/03 21:50:35 INFO TransportClientFactory: Successfully created connection t
o /<a href="http://10.0.0.6:34251" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A34251\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSPXFwTs3G9RUdBEMlK4cHLMnVwA&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A34251\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGSPXFwTs3G9RUdBEMlK4cHLMnVwA&#39;;return true;">10.0.0.6:34251 after 5 ms (0 ms spent in bootstraps)
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in
memory (estimated size 2.3 KB, free 5.2 GB)
18/09/03 21:50:36 INFO TorrentBroadcast: Reading broadcast variable 0 took 128 m
s
18/09/03 21:50:36 INFO MemoryStore: Block broadcast_0 stored as values in memory
 (estimated size 4.0 KB, free 5.2 GB)
18/09/03 21:50:36 INFO Executor: Finished task 7.0 in stage 0.0 (TID 7). 1702 by
tes result sent to driver
18/09/03 21:50:36 INFO Executor: Finished task 3.0 in stage 0.0 (TID 3). 2430 by
tes result sent to driver
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver commanded a shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.file.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.largeRead_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.read_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.read_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.write_bytes, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.filesystem.hdfs.write_ops, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.activeTasks, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.completeTasks, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.currentPool_size, value=2
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.executor.threadpool.maxPool_size, value=2147483647
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Old-Generation.count, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Old-Generation.time, value=0
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Young-Generation.count, value=4
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.G1-Young-Generation.time, value=58
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.committed, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.init, value=924844032
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.max, value=9663676416
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.usage, value=0.011385394467247857
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.heap.used, value=112121920
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.committed, value=50593792
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.max, value=-1
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.usage, value=-4.957636E7
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.non-heap.used, value=49584424
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.committed, value=7995392
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.init, value=2555904
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.max, value=251658240
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.usage, value=0.031464640299479166
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Code-Cache.used, value=7918336
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.committed, value=5111808
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.init, value=0
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from <a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337
disconnected during shutdown
18/09/03 21:50:36 INFO CoarseGrainedExecutorBackend: Driver from <a href="http://10.0.0.6:44337" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2F10.0.0.6%3A44337\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEgBn7SxUhgpS9ao-II0Kc_8k7_8A&#39;;return true;">10.0.0.6:44337
disconnected during shutdown
18/09/03 21:50:36 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.max, value=1073741824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.usage, value=0.004624105989933014
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Compressed-Class-Space.used, value=4965096
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.committed, value=557842432
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.init, value=50331648
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.usage, value=0.041353383458646614
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Eden-Space.used, value=23068672
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.committed, value=341835776
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.init, value=874512384
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.max, value=9663676416
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.usage, value=0.006828102800581191
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Old-Gen.used, value=65984576
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.committed, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.usage, value=1.0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.G1-Survivor-Space.used, value=25165824
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.committed, value=37486592
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.init, value=0
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.max, value=-1
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.usage, value=0.980258754916958
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.pools.Metaspace.used, value=36746560
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.committed, value=975503360
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.init, value=927399936
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.max, value=9663676415
18/09/03 21:50:37 INFO metrics: type=GAUGE, name=application_1536006861159_0028.
1.jvm.total.used, value=163924840
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.fileCacheHits, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.filesDiscovered, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.hiveClientCalls, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.parallelListingJobCount, count=0
18/09/03 21:50:37 INFO metrics: type=COUNTER, name=application_1536006861159_002
8.1.HiveExternalCatalog.partitionsFetched, count=0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.compilationTime, count=0, min=0, max=0, mean=0.0, stddev=0.0
, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.generatedClassSize, count=0, min=0, max=0, mean=0.0, stddev=
0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.generatedMethodSize, count=0, min=0, max=0, mean=0.0, stddev
=0.0, median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO metrics: type=HISTOGRAM, name=application_1536006861159_0
028.1.CodeGenerator.sourceCodeSize, count=0, min=0, max=0, mean=0.0, stddev=0.0,
 median=0.0, p75=0.0, p95=0.0, p98=0.0, p99=0.0, p999=0.0
18/09/03 21:50:37 INFO MemoryStore: MemoryStore cleared
18/09/03 21:50:37 INFO BlockManager: BlockManager stopped
18/09/03 21:50:37 INFO ShutdownHookManager: Shutdown hook called


The main issue when Im trying to run spark-submit
spark-submit --conf spark.logConf=true --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///alluxio/avro avro

/opt/alluxio/client/alluxio-1.8.0-client.jar added in a extra class path for driver and executor also

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

Any help will be awesome 
Thanks

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
In reply to this post by Deema Yatsyuk
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Lu Qiu
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://<a href="http://10.0.0.21:33843/classes
18/09/06">10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://<a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ <a href="http://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06">hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Lu Qiu
Hi Dmitry,

It looks like you format the alluxio when alluxio is running.

The recommended process is:
./bin/alluxio-stop.sh <parameters>
./bin/alluxio format
./bin/alluxio-start.sh <parameters>

format is a very powerful command and will trigger issues if format when nodes are running.

Could you follow the previous steps to stop-format-start again?

Thanks,
Lu


On Thu, Sep 6, 2018 at 3:03 PM, Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
In reply to this post by Deema Yatsyuk
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.






--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Lu Qiu
Hi Dmitry,

We took another look at your issue and didn't find any outstanding Alluxio usage issues. 
Perhaps there's a firewall issue and the firewall maybe make the port only available from certain addresses.

Trying to use `runTests` on the same node you run `spark-submit` and double check the firewall.
like in EC2, they have the security group, some ports may be exposed and some may not.

Thanks,
Lu

On Thu, Sep 6, 2018 at 3:17 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
Hello
In azure inside the same network all ports are open. And also runtests is working fine without any issues. So this cant be a firewall issue

пт, 7 сент. 2018 г. в 2:25, Lu Qiu <[hidden email]>:
Hi Dmitry,

We took another look at your issue and didn't find any outstanding Alluxio usage issues. 
Perhaps there's a firewall issue and the firewall maybe make the port only available from certain addresses.

Trying to use `runTests` on the same node you run `spark-submit` and double check the firewall.
like in EC2, they have the security group, some ports may be exposed and some may not.

Thanks,
Lu

On Thu, Sep 6, 2018 at 3:17 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
I can telnet on port 19999 and 19998 from all machines into master node.
Also here is runtests log

2018-09-07 06:24:30,676 INFO  MetricsSystem - Starting sinks with config: {}.
2018-09-07 06:24:30,689 INFO  FileSystemContext - Created filesystem context with id app-2459436255833188551. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
2018-09-07 06:24:30,749 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
runTest BASIC CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:30,866 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, rack=null)
2018-09-07 06:24:30,954 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,955 INFO  AbstractClient - Client registered with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:31,090 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@765d7657, remoteAddress: wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:29999).
2018-09-07 06:24:31,167 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 298 ms.
2018-09-07 06:24:31,228 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 61 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:31,243 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 11 ms.
2018-09-07 06:24:31,251 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:31,521 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@7690781, remoteAddress: wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.8:29999).
2018-09-07 06:24:32,540 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 1289 ms.
2018-09-07 06:24:32,553 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 13 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:32,561 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:32,578 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:34,334 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 1773 ms.
2018-09-07 06:24:34,349 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 12 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE THROUGH
2018-09-07 06:24:34,932 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 583 ms.
2018-09-07 06:24:35,562 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 630 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE THROUGH
2018-09-07 06:24:36,153 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 585 ms.
2018-09-07 06:24:36,214 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 61 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,252 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 37 ms.
2018-09-07 06:24:36,259 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 7 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,275 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 12 ms.
2018-09-07 06:24:36,285 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 10 ms.
Passed the test!
runTest BASIC CACHE MUST_CACHE
2018-09-07 06:24:36,306 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_MUST_CACHE took 21 ms.
2018-09-07 06:24:36,312 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_MUST_CACHE took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE MUST_CACHE
2018-09-07 06:24:36,333 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 10 ms.
2018-09-07 06:24:36,341 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE CACHE_THROUGH
2018-09-07 06:24:36,734 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 393 ms.
2018-09-07 06:24:36,742 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 8 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE CACHE_THROUGH
2018-09-07 06:24:37,176 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 429 ms.
2018-09-07 06:24:37,183 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC CACHE THROUGH
2018-09-07 06:24:37,636 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_THROUGH took 452 ms.
2018-09-07 06:24:37,659 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_THROUGH took 23 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE THROUGH
2018-09-07 06:24:38,528 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 864 ms.
2018-09-07 06:24:38,546 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 18 ms.
Passed the test!
runTest BASIC CACHE ASYNC_THROUGH
2018-09-07 06:24:38,563 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 17 ms.
2018-09-07 06:24:38,567 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE ASYNC_THROUGH
2018-09-07 06:24:38,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:38,592 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 6 ms.
Passed the test!
runTest BASIC NO_CACHE MUST_CACHE
2018-09-07 06:24:38,606 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 14 ms.
2018-09-07 06:24:38,610 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE MUST_CACHE
2018-09-07 06:24:38,623 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 9 ms.
2018-09-07 06:24:38,626 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 3 ms.
Passed the test!
runTest BASIC NO_CACHE CACHE_THROUGH
2018-09-07 06:24:39,849 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 1222 ms.
2018-09-07 06:24:39,855 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE CACHE_THROUGH
2018-09-07 06:24:40,659 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 800 ms.
2018-09-07 06:24:40,663 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 4 ms.
Passed the test!
runTest BASIC NO_CACHE THROUGH
2018-09-07 06:24:41,182 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_THROUGH took 518 ms.
2018-09-07 06:24:41,196 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_THROUGH took 14 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE THROUGH
2018-09-07 06:24:41,541 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 340 ms.
2018-09-07 06:24:41,557 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 16 ms.
Passed the test!
runTest BASIC NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,569 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:41,572 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 3 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 10 ms.
2018-09-07 06:24:41,590 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!

It is the same green from master and worker node.
Thanks for you cooperation.


On Fri, Sep 7, 2018 at 9:07 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
In azure inside the same network all ports are open. And also runtests is working fine without any issues. So this cant be a firewall issue

пт, 7 сент. 2018 г. в 2:25, Lu Qiu <[hidden email]>:
Hi Dmitry,

We took another look at your issue and didn't find any outstanding Alluxio usage issues. 
Perhaps there's a firewall issue and the firewall maybe make the port only available from certain addresses.

Trying to use `runTests` on the same node you run `spark-submit` and double check the firewall.
like in EC2, they have the security group, some ports may be exposed and some may not.

Thanks,
Lu

On Thu, Sep 6, 2018 at 3:17 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Lu Qiu
Hi Dmitry,

The heartbeat performed by MetricsMasterClient is just for collecting metrics, it will not be the root cause of the failure of the spark job.
We need more exception messages before and after this failure to better dig out the root cause.

Your spark-submit didn't fail because of the connectivity and we found a more possible failure in your exception message:

Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser


One of the possible root cause is:
 User yarn is not configured for any impersonation. impersonationUser: sshuser

It's a spark with yarn issue. In hadoop, I solve impersonation problem by modifying core-site.xml illustrated in https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html. I am not quite sure if this method could also solve the spark issue, but hope it helps.

Try to run again spark-submit, and if the Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser do exist, then this is likely to be the key issue. Otherwise, wait for the exception message to be more completed and detailed and send the whole exception message to us.

Thanks,
Lu




On Thu, Sep 6, 2018 at 11:26 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I can telnet on port 19999 and 19998 from all machines into master node.
Also here is runtests log

2018-09-07 06:24:30,676 INFO  MetricsSystem - Starting sinks with config: {}.
2018-09-07 06:24:30,689 INFO  FileSystemContext - Created filesystem context with id app-2459436255833188551. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
2018-09-07 06:24:30,749 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
runTest BASIC CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:30,866 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, rack=null)
2018-09-07 06:24:30,954 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,955 INFO  AbstractClient - Client registered with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:31,090 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@765d7657, remoteAddress: wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:29999).
2018-09-07 06:24:31,167 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 298 ms.
2018-09-07 06:24:31,228 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 61 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:31,243 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 11 ms.
2018-09-07 06:24:31,251 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:31,521 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@7690781, remoteAddress: wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.8:29999).
2018-09-07 06:24:32,540 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 1289 ms.
2018-09-07 06:24:32,553 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 13 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:32,561 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:32,578 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:34,334 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 1773 ms.
2018-09-07 06:24:34,349 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 12 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE THROUGH
2018-09-07 06:24:34,932 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 583 ms.
2018-09-07 06:24:35,562 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 630 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE THROUGH
2018-09-07 06:24:36,153 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 585 ms.
2018-09-07 06:24:36,214 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 61 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,252 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 37 ms.
2018-09-07 06:24:36,259 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 7 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,275 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 12 ms.
2018-09-07 06:24:36,285 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 10 ms.
Passed the test!
runTest BASIC CACHE MUST_CACHE
2018-09-07 06:24:36,306 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_MUST_CACHE took 21 ms.
2018-09-07 06:24:36,312 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_MUST_CACHE took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE MUST_CACHE
2018-09-07 06:24:36,333 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 10 ms.
2018-09-07 06:24:36,341 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE CACHE_THROUGH
2018-09-07 06:24:36,734 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 393 ms.
2018-09-07 06:24:36,742 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 8 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE CACHE_THROUGH
2018-09-07 06:24:37,176 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 429 ms.
2018-09-07 06:24:37,183 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC CACHE THROUGH
2018-09-07 06:24:37,636 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_THROUGH took 452 ms.
2018-09-07 06:24:37,659 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_THROUGH took 23 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE THROUGH
2018-09-07 06:24:38,528 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 864 ms.
2018-09-07 06:24:38,546 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 18 ms.
Passed the test!
runTest BASIC CACHE ASYNC_THROUGH
2018-09-07 06:24:38,563 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 17 ms.
2018-09-07 06:24:38,567 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE ASYNC_THROUGH
2018-09-07 06:24:38,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:38,592 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 6 ms.
Passed the test!
runTest BASIC NO_CACHE MUST_CACHE
2018-09-07 06:24:38,606 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 14 ms.
2018-09-07 06:24:38,610 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE MUST_CACHE
2018-09-07 06:24:38,623 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 9 ms.
2018-09-07 06:24:38,626 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 3 ms.
Passed the test!
runTest BASIC NO_CACHE CACHE_THROUGH
2018-09-07 06:24:39,849 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 1222 ms.
2018-09-07 06:24:39,855 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE CACHE_THROUGH
2018-09-07 06:24:40,659 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 800 ms.
2018-09-07 06:24:40,663 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 4 ms.
Passed the test!
runTest BASIC NO_CACHE THROUGH
2018-09-07 06:24:41,182 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_THROUGH took 518 ms.
2018-09-07 06:24:41,196 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_THROUGH took 14 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE THROUGH
2018-09-07 06:24:41,541 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 340 ms.
2018-09-07 06:24:41,557 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 16 ms.
Passed the test!
runTest BASIC NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,569 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:41,572 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 3 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 10 ms.
2018-09-07 06:24:41,590 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!

It is the same green from master and worker node.
Thanks for you cooperation.


On Fri, Sep 7, 2018 at 9:07 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
In azure inside the same network all ports are open. And also runtests is working fine without any issues. So this cant be a firewall issue

пт, 7 сент. 2018 г. в 2:25, Lu Qiu <[hidden email]>:
Hi Dmitry,

We took another look at your issue and didn't find any outstanding Alluxio usage issues. 
Perhaps there's a firewall issue and the firewall maybe make the port only available from certain addresses.

Trying to use `runTests` on the same node you run `spark-submit` and double check the firewall.
like in EC2, they have the security group, some ports may be exposed and some may not.

Thanks,
Lu

On Thu, Sep 6, 2018 at 3:17 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Deema Yatsyuk
Hello
Spark-submit works fine from hdfs or wasb, but error from alluxio

also all proxy users set to *

error is the same also now.\

And here is my core-site.xml

<configuration>

    <property>
      <name>fs.AbstractFileSystem.wasb.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasb</value>
    </property>

    <property>
      <name>fs.AbstractFileSystem.wasbs.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasbs</value>
    </property>

    <property>
      <name>fs.alluxio.impl</name>
      <value>alluxio.hadoop.FileSystem</value>
    </property>

    <property>
      <value>MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE</value>
    </property>

    <property>
      <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
    </property>

    <property>
      <name>fs.azure.io.copyblob.retry.max.retries</name>
      <value>60</value>
    </property>

    <property>
      <name>fs.azure.io.read.tolerate.concurrent.append</name>
      <value>true</value>
    </property>

    <property>
      <name>fs.azure.page.blob.dir</name>
      <value>/mapreducestaging,/atshistory,/tezstaging,/ams/hbase/WALs,/ams/hbase/oldWALs,/ams/hbase/MasterProcWALs</value>
    </property>

    <property>
      <name>fs.azure.shellkeyprovider.script</name>
      <value>/usr/lib/hdinsight-common/scripts/decrypt.sh</value>
    </property>

    <property>
      <name>fs.defaultFS</name>
      <value>wasb://[hidden email]</value>
      <final>true</final>
    </property>

    <property>
      <name>fs.trash.interval</name>
      <value>360</value>
    </property>

    <property>
      <name>ha.failover-controller.active-standby-elector.zk.op.retries</name>
      <value>120</value>
    </property>

    <property>
      <name>ha.zookeeper.quorum</name>
    </property>

    <property>
      <name>hadoop.custom-extensions.root</name>
      <value>/hdp/ext/2.6/hadoop</value>
    </property>

    <property>
      <name>hadoop.http.authentication.simple.anonymous.allowed</name>
      <value>true</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hcat.groups</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hcat.hosts</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hive.groups</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hive.hosts</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.oozie.groups</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.proxyuser.oozie.hosts</name>
      <value>*</value>
    </property>

    <property>
      <name>hadoop.security.auth_to_local</name>
      <value>DEFAULT</value>
    </property>

    <property>
      <name>hadoop.security.authentication</name>
      <value>simple</value>
    </property>

    <property>
      <name>hadoop.security.authorization</name>
      <value>false</value>
    </property>

    <property>
      <name>hadoop.security.key.provider.path</name>
      <value></value>
    </property>

    <property>
      <name>io.compression.codecs</name>
      <value>org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>

    <property>
      <name>io.file.buffer.size</name>
      <value>131072</value>
    </property>

    <property>
      <name>io.serializations</name>
      <value>org.apache.hadoop.io.serializer.WritableSerialization</value>
    </property>

    <property>
      <name>ipc.client.connect.max.retries</name>
      <value>50</value>
    </property>

    <property>
      <name>ipc.client.connection.maxidletime</name>
      <value>30000</value>
    </property>

    <property>
      <name>ipc.client.idlethreshold</name>
      <value>8000</value>
    </property>

    <property>
      <name>ipc.server.tcpnodelay</name>
      <value>true</value>
    </property>

    <property>
      <name>mapreduce.jobtracker.webinterface.trusted</name>
      <value>false</value>
    </property>

    <property>
      <name>net.topology.script.file.name</name>
      <value>/etc/hadoop/conf/topology_script.py</value>
    </property>

On Fri, Sep 7, 2018 at 9:42 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

The heartbeat performed by MetricsMasterClient is just for collecting metrics, it will not be the root cause of the failure of the spark job.
We need more exception messages before and after this failure to better dig out the root cause.

Your spark-submit didn't fail because of the connectivity and we found a more possible failure in your exception message:

Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser


One of the possible root cause is:
 User yarn is not configured for any impersonation. impersonationUser: sshuser

It's a spark with yarn issue. In hadoop, I solve impersonation problem by modifying core-site.xml illustrated in https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/Superusers.html. I am not quite sure if this method could also solve the spark issue, but hope it helps.

Try to run again spark-submit, and if the Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser do exist, then this is likely to be the key issue. Otherwise, wait for the exception message to be more completed and detailed and send the whole exception message to us.

Thanks,
Lu




On Thu, Sep 6, 2018 at 11:26 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I can telnet on port 19999 and 19998 from all machines into master node.
Also here is runtests log

2018-09-07 06:24:30,676 INFO  MetricsSystem - Starting sinks with config: {}.
2018-09-07 06:24:30,689 INFO  FileSystemContext - Created filesystem context with id app-2459436255833188551. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
2018-09-07 06:24:30,749 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,791 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,799 INFO  AbstractClient - Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
runTest BASIC CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:30,866 INFO  TieredIdentityFactory - Initialized tiered identity TieredIdentity(node=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, rack=null)
2018-09-07 06:24:30,954 INFO  AbstractClient - Alluxio client (version 1.8.0) is trying to connect with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:30,955 INFO  AbstractClient - Client registered with BlockMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
2018-09-07 06:24:31,090 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@765d7657, remoteAddress: wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:29999).
2018-09-07 06:24:31,167 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 298 ms.
2018-09-07 06:24:31,228 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_MUST_CACHE took 61 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE MUST_CACHE
2018-09-07 06:24:31,243 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 11 ms.
2018-09-07 06:24:31,251 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:31,521 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@7690781, remoteAddress: wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.8:29999).
2018-09-07 06:24:32,540 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 1289 ms.
2018-09-07 06:24:32,553 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_CACHE_THROUGH took 13 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE CACHE_THROUGH
2018-09-07 06:24:32,561 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:32,578 INFO  NettyChannelPool - Created netty channel with netty bootstrap Bootstrap(group: EpollEventLoopGroup, channelFactory: EpollSocketChannel.class, options: {SO_KEEPALIVE=true, TCP_NODELAY=true, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), EPOLL_MODE=LEVEL_TRIGGERED}, handler: alluxio.network.netty.NettyClient$1@10959ece, remoteAddress: wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.9:29999).
2018-09-07 06:24:34,334 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 1773 ms.
2018-09-07 06:24:34,349 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_CACHE_THROUGH took 12 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE THROUGH
2018-09-07 06:24:34,932 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 583 ms.
2018-09-07 06:24:35,562 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_THROUGH took 630 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE THROUGH
2018-09-07 06:24:36,153 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 585 ms.
2018-09-07 06:24:36,214 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_THROUGH took 61 ms.
Passed the test!
runTest BASIC CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,252 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 37 ms.
2018-09-07 06:24:36,259 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_PROMOTE_ASYNC_THROUGH took 7 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE_PROMOTE ASYNC_THROUGH
2018-09-07 06:24:36,275 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 12 ms.
2018-09-07 06:24:36,285 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_PROMOTE_ASYNC_THROUGH took 10 ms.
Passed the test!
runTest BASIC CACHE MUST_CACHE
2018-09-07 06:24:36,306 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_MUST_CACHE took 21 ms.
2018-09-07 06:24:36,312 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_MUST_CACHE took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE MUST_CACHE
2018-09-07 06:24:36,333 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 10 ms.
2018-09-07 06:24:36,341 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_MUST_CACHE took 8 ms.
Passed the test!
runTest BASIC CACHE CACHE_THROUGH
2018-09-07 06:24:36,734 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 393 ms.
2018-09-07 06:24:36,742 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_CACHE_THROUGH took 8 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE CACHE_THROUGH
2018-09-07 06:24:37,176 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 429 ms.
2018-09-07 06:24:37,183 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC CACHE THROUGH
2018-09-07 06:24:37,636 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_THROUGH took 452 ms.
2018-09-07 06:24:37,659 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_THROUGH took 23 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE THROUGH
2018-09-07 06:24:38,528 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 864 ms.
2018-09-07 06:24:38,546 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_THROUGH took 18 ms.
Passed the test!
runTest BASIC CACHE ASYNC_THROUGH
2018-09-07 06:24:38,563 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 17 ms.
2018-09-07 06:24:38,567 INFO  BasicOperations - readFile file /default_tests_files/BASIC_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER CACHE ASYNC_THROUGH
2018-09-07 06:24:38,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:38,592 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_CACHE_ASYNC_THROUGH took 6 ms.
Passed the test!
runTest BASIC NO_CACHE MUST_CACHE
2018-09-07 06:24:38,606 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 14 ms.
2018-09-07 06:24:38,610 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_MUST_CACHE took 4 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE MUST_CACHE
2018-09-07 06:24:38,623 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 9 ms.
2018-09-07 06:24:38,626 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_MUST_CACHE took 3 ms.
Passed the test!
runTest BASIC NO_CACHE CACHE_THROUGH
2018-09-07 06:24:39,849 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 1222 ms.
2018-09-07 06:24:39,855 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_CACHE_THROUGH took 6 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE CACHE_THROUGH
2018-09-07 06:24:40,659 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 800 ms.
2018-09-07 06:24:40,663 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_CACHE_THROUGH took 4 ms.
Passed the test!
runTest BASIC NO_CACHE THROUGH
2018-09-07 06:24:41,182 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_THROUGH took 518 ms.
2018-09-07 06:24:41,196 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_THROUGH took 14 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE THROUGH
2018-09-07 06:24:41,541 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 340 ms.
2018-09-07 06:24:41,557 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_THROUGH took 16 ms.
Passed the test!
runTest BASIC NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,569 INFO  BasicOperations - writeFile to file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 11 ms.
2018-09-07 06:24:41,572 INFO  BasicOperations - readFile file /default_tests_files/BASIC_NO_CACHE_ASYNC_THROUGH took 3 ms.
Passed the test!
runTest BASIC_NON_BYTE_BUFFER NO_CACHE ASYNC_THROUGH
2018-09-07 06:24:41,586 INFO  BasicNonByteBufferOperations - writeFile to file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 10 ms.
2018-09-07 06:24:41,590 INFO  BasicNonByteBufferOperations - readFile file /default_tests_files/BASIC_NON_BYTE_BUFFER_NO_CACHE_ASYNC_THROUGH took 4 ms.
Passed the test!

It is the same green from master and worker node.
Thanks for you cooperation.


On Fri, Sep 7, 2018 at 9:07 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
In azure inside the same network all ports are open. And also runtests is working fine without any issues. So this cant be a firewall issue

пт, 7 сент. 2018 г. в 2:25, Lu Qiu <[hidden email]>:
Hi Dmitry,

We took another look at your issue and didn't find any outstanding Alluxio usage issues. 
Perhaps there's a firewall issue and the firewall maybe make the port only available from certain addresses.

Trying to use `runTests` on the same node you run `spark-submit` and double check the firewall.
like in EC2, they have the security group, some ports may be exposed and some may not.

Thanks,
Lu

On Thu, Sep 6, 2018 at 3:17 PM, Dmitry Yatsyuk <[hidden email]> wrote:
I have formated alluxio and now master and workers are fine, i have copied back test files. but the original issue is the same

 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

If you want I can provide you ssh access and admin access to ambari

On Fri, Sep 7, 2018 at 1:03 AM Dmitry Yatsyuk <[hidden email]> wrote:
also now when i try to restart master node i have the following exception

2018-09-06 21:55:57,795 ERROR UfsJournalCheckpointThread - FileSystemMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 436 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,796 ERROR UfsJournalCheckpointThread - BlockMaster: Failed to run journal checkpoint thread, crashing.
java.lang.IllegalStateException: Journal entries are missing between sequence number 0 (inclusive) and 49 (exclusive).
at alluxio.master.journal.ufs.UfsJournalReader.read(UfsJournalReader.java:160)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.runInternal(UfsJournalCheckpointThread.java:141)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.run(UfsJournalCheckpointThread.java:123)
2018-09-06 21:55:57,799 INFO  UfsJournalCheckpointThread - BlockMaster: Journal shutdown complete
2018-09-06 21:55:57,799 ERROR ProcessUtils - Uncaught exception while running Alluxio master @hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998, stopping it and exiting.
java.lang.IllegalStateException
at com.google.common.base.Preconditions.checkState(Preconditions.java:133)
at alluxio.master.journal.ufs.UfsJournalCheckpointThread.getNextSequenceNumber(UfsJournalCheckpointThread.java:116)
at alluxio.master.journal.ufs.UfsJournal.gainPrimacy(UfsJournal.java:207)
at alluxio.master.journal.ufs.UfsJournalSystem.gainPrimacy(UfsJournalSystem.java:68)
at alluxio.master.AlluxioMasterProcess.start(AlluxioMasterProcess.java:226)
at alluxio.ProcessUtils.run(ProcessUtils.java:32)
at alluxio.master.AlluxioMaster.main(AlluxioMaster.java:55)

On Fri, Sep 7, 2018 at 12:48 AM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
It hangs from master node on stage

[Stage 0:>                                                          (0 + 2) / 2]

and the same issue on executor log

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/bigstream/bigstreamNSD/spark-2.1.1-BIGSTREAM-bin-bigstream-spark-yarn-h2.7.2/nsd-jars-2.1/alluxio-1.8.0-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark2/jars/slf4j-log4j12-1.7.16.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/hadoop/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/hdp/2.6.2.38-1/spark_llap/spark-llap-assembly-1.0.0.2.6.2.38-1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
18/09/06 21:42:03 INFO CoarseGrainedExecutorBackend: Started daemon with process name: 28074@wn1-nsd-bg
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for TERM
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for HUP
18/09/06 21:42:03 INFO SignalUtils: Registered signal handler for INT
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 60 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO SecurityManager: Changing view acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls to: yarn,sshuser
18/09/06 21:42:04 INFO SecurityManager: Changing view acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: Changing modify acls groups to: 
18/09/06 21:42:04 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(yarn, sshuser); groups with view permissions: Set(); users  with modify permissions: Set(yarn, sshuser); groups with modify permissions: Set()
18/09/06 21:42:04 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:42:04 INFO DiskBlockManager: Created local directory at /mnt/resource/hadoop/yarn/local/usercache/sshuser/appcache/application_1536145998851_0026/blockmgr-aa883160-f530-41f6-a683-5d13cd04113a
18/09/06 21:42:04 INFO MemoryStore: MemoryStore started with capacity 5.2 GB
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Connecting to driver: spark://CoarseGrainedScheduler@10.0.0.21:33843
18/09/06 21:42:04 INFO CoarseGrainedExecutorBackend: Successfully registered with driver
18/09/06 21:42:04 INFO Executor: Starting executor ID 4 on host wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net
18/09/06 21:42:04 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 40203.
18/09/06 21:42:04 INFO NettyBlockTransferService: Server created on wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:40203
18/09/06 21:42:04 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
18/09/06 21:42:04 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO BlockManager: Initialized BlockManager: BlockManagerId(4, wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net, 40203, None)
18/09/06 21:42:04 INFO Executor: Using REPL class URI: spark://10.0.0.21:33843/classes
18/09/06 21:44:51 INFO CoarseGrainedExecutorBackend: Got assigned task 0
18/09/06 21:44:51 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
18/09/06 21:44:51 INFO TorrentBroadcast: Started reading broadcast variable 1
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:32875 after 2 ms (0 ms spent in bootstraps)
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 37.5 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TorrentBroadcast: Reading broadcast variable 1 took 133 ms
18/09/06 21:44:51 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 97.7 KB, free 5.2 GB)
18/09/06 21:44:51 INFO TransportClientFactory: Successfully created connection to /10.0.0.21:33843 after 16 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
18/09/06 21:44:52 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
18/09/06 21:44:52 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
18/09/06 21:44:52 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
18/09/06 21:44:52 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
18/09/06 21:44:52 INFO HadoopRDD: Input split: alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input:0+13423
18/09/06 21:44:52 INFO TorrentBroadcast: Started reading broadcast variable 0
18/09/06 21:44:52 INFO TransportClientFactory: Successfully created connection to wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.4:44275 after 1 ms (0 ms spent in bootstraps)
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 34.1 KB, free 5.2 GB)
18/09/06 21:44:52 INFO TorrentBroadcast: Reading broadcast variable 0 took 70 ms
18/09/06 21:44:52 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 469.5 KB, free 5.2 GB)
18/09/06 21:44:52 INFO MetricsConfig: loaded properties from hadoop-metrics2-azure-file-system.properties
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init starting.
18/09/06 21:44:52 INFO AzureIaasSink: Init starting. Initializing MdsLogger.
18/09/06 21:44:52 INFO AzureIaasSink: Init completed.
18/09/06 21:44:52 INFO WasbAzureIaasSink: Init completed.
18/09/06 21:44:52 INFO MetricsSinkAdapter: Sink azurefs2 started
18/09/06 21:44:52 INFO MetricsSystemImpl: Scheduled snapshot period at 60 second(s).
18/09/06 21:44:52 INFO MetricsSystemImpl: azure-file-system metrics system started
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO MetricsSystem: Starting sinks with config: {}.
18/09/06 21:44:52 INFO FileSystemContext: Created filesystem context with id app-3381863313617109164. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:52 INFO HadoopConfigurationUtils: Loading Alluxio properties from Hadoop configuration: {fs.azure.account.key.testryui.blob.core.windows.net=MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE}
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to bootstrap-connect with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client has bootstrap-connected with hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:52 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO HeartbeatThread: Hearbeat Master Metrics Sync is interrupted.
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO AbstractClient: Client registered with FileSystemMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 INFO FileSystemContext: Created filesystem context with id app-4410116450193773659. This ID will be used for identifying info from the client, such as metrics. It can be set manually through the alluxio.user.app.id property
18/09/06 21:44:53 INFO AbstractClient: Alluxio client (version 1.8.0) is trying to connect with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998
18/09/06 21:44:53 WARN AbstractClient: Failed to connect (1) with MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998: Peer indicated failure: Plain authentication failed: User yarn is not configured for any impersonation. impersonationUser: sshuser
18/09/06 21:44:53 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Thu, Sep 6, 2018 at 11:21 PM Lu Qiu <[hidden email]> wrote:
Hi Dmitry,

Sorry for the late reply.

Could you try the spark shell and see if spark shell is able to connect to Alluxio?

```
> val s = sc.textFile("alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Input")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://  hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/Output")
``` 

Could you try it in your master node(10.0.0.21) first and then try it again in the node you run spark-submit before (maybe 10.0.0.6)?

Thanks,
Lu


On Thu, Sep 6, 2018 at 9:54 AM, Dmitry Yatsyuk <[hidden email]> wrote:
here is also log from /opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn


On Thu, Sep 6, 2018 at 7:46 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello
May be you have more suggestions to me.
Many thanks

On Wed, Sep 5, 2018 at 10:23 PM Dmitry Yatsyuk <[hidden email]> wrote:
and yes I run spark-submit from a node where alluxio master installed

On Wed, Sep 5, 2018 at 10:22 PM Dmitry Yatsyuk <[hidden email]> wrote:
Hello

All workers are live

live Workers

   wn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 444.88MB   99%Free
   wn1-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 593.25MB   99%Free
   wn4-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net 0 In Service 36.69GB 94.63MB     99%Free



Alluxio Summary

            Started:                                      09-05-2018 12:40:10:248
             Uptime:                        0 day(s), 6 hour(s), 41 minute(s), and 28 second(s)
            Version:                                               1.8.0
        Running Workers:                                             3
   Startup Consistency Check:                                     COMPLETE
   Server Configuration Check:                                     PASSED

Cluster Usage Summary

    Workers Capacity:         110.08GB
   Workers Free / Used: 108.97GB / 1132.76MB
    UnderFS Capacity:        1177.79GB
   UnderFS Free / Used: 1177.79GB / 192.00KB

Storage Usage Summary

   Storage Alias
   Space Capacity
   Space Used
   Space Usage
   MEM 110.08GB 1132.76MB
   99%Free



ср, 5 сент. 2018 г. в 20:07, Lu Qiu <[hidden email]>:
Hi,

The error 
Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts

is usually caused by:

(1) the input hostname(or port) is wrong or the system cannot resolve the hostname(especially when spark and alluxio are on different nodes).
Did you run the spark-submit on the Alluxio master node?

(2) Alluxio cluster is not running normally. You could visit alluxio://
-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19999 to see if Alluxio master is alive and visit the Workers page to see if workers are alive.


Thanks,
Lu

On Wed, Sep 5, 2018 at 4:29 AM, Dmitry Yatsyuk <[hidden email]> wrote:
I changed spark-submit to 

and now error on executor is

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net:19998/avro avro

18/09/05 11:25:44 ERROR ClientMasterSync: Failed to heartbeat to the metrics master: {}
alluxio.exception.status.UnavailableException: Failed to connect to MetricsMasterClient @ hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/10.0.0.21:19998 after 1 attempts
	at alluxio.AbstractClient.connect(AbstractClient.java:325)
	at alluxio.client.metrics.MetricsMasterClient.heartbeat(MetricsMasterClient.java:84)
	at alluxio.client.metrics.ClientMasterSync.heartbeat(ClientMasterSync.java:63)
	at alluxio.heartbeat.HeartbeatThread.run(HeartbeatThread.java:74)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)


On Wed, Sep 5, 2018 at 2:20 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
I tried your suggestion but now the error is the following:
Exception in thread "main" java.lang.ExceptionInInitializerError
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:514)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource.hasMetadata(DataSource.scala:301)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: port out of range:-1
at java.net.InetSocketAddress.checkPort(InetSocketAddress.java:143)
at java.net.InetSocketAddress.<init>(InetSocketAddress.java:224)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:218)
at alluxio.util.network.NetworkAddressUtils.getConnectAddress(NetworkAddressUtils.java:204)
at alluxio.client.lineage.LineageContext.reset(LineageContext.java:64)
at alluxio.client.lineage.LineageContext.<init>(LineageContext.java:35)
at alluxio.client.lineage.LineageContext.<clinit>(LineageContext.java:27)

comand what I ran is
spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net' --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-bg.gngqf3y3e5euffmn4cmwrh5lef.cx.internal.cloudapp.net/avro avro

On Wed, Sep 5, 2018 at 4:11 AM, Lu Qiu <[hidden email]> wrote:
Hi Deema,

The more common way to set alluxio configuration through spark-submit command line options is:

spark-submit \
--conf 'spark.driver.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
--conf 'spark.executor.extraJavaOptions=-Dalluxio.user.file.writetype.default=CACHE_THROUGH' \
...


In your case, you could try

spark-submit --conf 'spark.driver.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net' --conf 'spark.executor.extraJavaOptions=-Dalluxio.master.hostname=hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

Hope that one of the ways work for you!

Thanks, 
Lu

On Tue, Sep 4, 2018 at 5:55 PM, Lu Qiu <[hidden email]> wrote:
Hi Deema,


If you only use the alluxio to provide the input file, you could try passing the whole alluxio path like 
`
spark-submit --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

If you are using `alluxio:///` in the jar, spark needs to know the alluxio master hostname.
you could add the master hostname in the `core-site.xml` in your spark home conf directory.

<configuration>
  <property>
    <name>fs.alluxio.impl</name>
    <value>alluxio.hadoop.FileSystem</value>
  </property>
  <property>
    <name>alluxio.master.hostname</name>
  </property>
</configuration>

Or trying

`
spark-submit --conf spark.hadoop.defaultFS="alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998--master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net:19998/avro avro
`

In addition, could you share the console message of `opt/alluxio/integration/checker/bin/alluxio-checker.sh spark yarn` so that we could know better why spark checker failed?

Thanks,
Lu


On Tue, Sep 4, 2018 at 2:25 AM, Deema Yatsyuk <[hidden email]> wrote:
and tried to run spark submit 
spark-submit --conf spark.hadoop.defaultFS="-Dalluxio://hn0-nsd-hd.yawj1ew5rq4e1biarf4nr5ngsc.ax.internal.cloudapp.net" --master yarn --class co.bigstream.benchmark.TPCSQ3 tpcds_2.11-1.0.jar 2 alluxio:///avro avro

 but got the same issue

Exception in thread "main" java.lang.NullPointerException: URI hostname must not be null
at alluxio.core.client.runtime.com.google.common.base.Preconditions.checkNotNull(Preconditions.java:208)
at alluxio.hadoop.AbstractFileSystem.initializeInternal(AbstractFileSystem.java:506)
at alluxio.hadoop.AbstractFileSystem.initialize(AbstractFileSystem.java:483)
at alluxio.hadoop.FileSystem.initialize(FileSystem.java:27)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2795)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:99)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2829)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2811)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:390)
at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:372)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:370)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:152)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:135)
at co.bigstream.benchmark.TPCSQ3$.main(TPCSQ3.scala:64)
at co.bigstream.benchmark.TPCSQ3.main(TPCSQ3.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:751)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:187)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:212)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/09/04 09:23:53 INFO SparkContext: Invoking stop() from shutdown hook 

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.







--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: issue with running spark-submit

Lu Qiu
Could you run again the spark-submit command, wait for more error messages, and provide all the console messages to us?

Thanks,
Lu

On Fri, Sep 7, 2018 at 2:54 PM, Dmitry Yatsyuk <[hidden email]> wrote:
Hello
Spark-submit works fine from hdfs or wasb, but error from alluxio

also all proxy users set to *

error is the same also now.\

And here is my core-site.xml

<configuration>

    <property>
      <name>fs.AbstractFileSystem.wasb.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasb</value>
    </property>

    <property>
      <name>fs.AbstractFileSystem.wasbs.impl</name>
      <value>org.apache.hadoop.fs.azure.Wasbs</value>
    </property>

    <property>
      <name>fs.alluxio.impl</name>
      <value>alluxio.hadoop.FileSystem</value>
    </property>

    <property>
      <value>MIIB/QYJKoZIhvcNAQcDoIIB7jCCAeoCAQAxggFdMIIBWQIBADBBMC0xKzApBgNVBAMTImRiZW5jcnlwdGlvbi5oZGluc2lnaHRzZXJ2aWNlcy5uZXQCEGgOgBofxf6kRbb5r8cSuagwDQYJKoZIhvcNAQEBBQAEggEAKgwuNMr9JwNtvHY7VRE+8t991LCO8n0uJUOtJc2vQET+pZnK/2jviZh8AaIDra0AhDs8eGB71G8Xjet0fSyNZXb80660ZV3BBHKaDVNybwD12HLvjwLoOEZ3O9NqGls7D7owsoWO7gw0sGgu96dnEjvDvB8uSck8UveXMSk/tZNeU9+jYjfctu8TGRP+B4YCgJWcLMeGHczhPx6NuBPdmg/0eoe3w8VVPCi6vIs3lsyzYEi1LVrMkIJ2aBWuwyG7bUKu4tLhfYHY13YDO5PtBfNtWX3QkzejxvY9lewGuPU3w9Fw3GFK/YrgzFBtBycyBFMYl6lQyIFPNLtzPv2l9jCBgwYJKoZIhvcNAQcBMBQGCCqGSIb3DQMHBAgO9GVL4NR2GoBgn8fCaRg2y303wKFrgNOukYYmLoOKq5cQXP/9EY+3WD9BSMjE9/WyQ87FIhPeeEiq6JPNdBIoTZagQpj11b+TL7uMmz+j1Jx4LDozmAoLPQItbwaNnFZlYRWrgY8OiOuE</value>
    </property>

    <property>
      <value>org.apache.hadoop.fs.azure.ShellDecryptionKeyProvider</value>
    </property>

    <property>
      <name>fs.azure.io.copyblob.retry.max.retries</name>
      <value>60</value>
    </property>

    <property>
      <name>fs.azure.io.read.tolerate.concurrent.append</name>
      <value>true</value>
    </property>

    <property>
      <name>fs.azure.page.blob.dir</name>
      <value>/mapreducestaging,/atshistory,/tezstaging,/ams/hbase/WALs,/ams/hbase/oldWALs,/ams/hbase/MasterProcWALs</value>
    </property>

    <property>
      <name>fs.azure.shellkeyprovider.script</name>
      <value>/usr/lib/hdinsight-common/scripts/decrypt.sh</value>
    </property>

    <property>
      <name>fs.defaultFS</name>
      <value>wasb://[hidden email]</value>
      <final>true</final>
    </property>

    <property>
      <name>fs.trash.interval</name>
      <value>360</value>
    </property>

    <property>
      <name>ha.failover-controller.active-standby-elector.zk.op.retries</name>
      <value>120</value>
    </property>

    <property>
      <name>ha.zookeeper.quorum</name>
    </property>

    <property>
      <name>hadoop.custom-extensions.root</name>
      <value>/hdp/ext/2.6/hadoop</value>
    </property>

    <property>
      <name>hadoop.http.authentication.simple.anonymous.allowed</name>
      <value>true</value>
    </property>

    <property>
      <name>hadoop.proxyuser.hcat.groups</name>
      <value>*</value>
    </property>

<