HBase with Alluxio failed,please help!

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

HBase with Alluxio failed,please help!

倪项菲
Hi Expert,
     I am using HBase 1.2.6 and Alluxio 1.6.0,the hbase regionserver went down one by one,there is no alive regionserver at last,it returned error when spliting logs,here is the log from hmaster:

WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 4 but only 2 done
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 4 done = 2 error = 2
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:50,580 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617 entered state: DONE plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:50,594 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617
2018-07-12 16:52:50,594 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 does not exist
2018-07-12 16:52:50,594 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617
2018-07-12 16:52:50,600 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,602 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 3 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,625 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 acquired by plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:50,661 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:50,666 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:50,666 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:50,673 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 entered state: ERR plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,673 WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:53,423 INFO  [plat-ecloud01-bigdata-journalnode01,60000,1531384672092_splitLogManager__ChoreService_1] master.SplitLogManager: total tasks = 2 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951=last_update = 1531385570479 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode04,16020,1531384985479 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 2 done = 1 error = 0, /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382=last_update = 1531385570656 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode09,16020,1531384792828 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 3 done = 1 error = 1}
2018-07-12 16:52:55,531 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 entered state: DONE plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:55,544 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 does not exist
2018-07-12 16:52:55,544 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 3 but only 2 done
2018-07-12 16:52:55,545 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 3 done = 2 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:55,648 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,650 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 2 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,707 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,717 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:55,718 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:55,718 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:55,718 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:56,213 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951 entered state: DONE plat-ecloud01-bigdata-datanode04,16020,1531384985479
2018-07-12 16:52:56,226 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951
2018-07-12 16:52:56,227 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 does not exist
2018-07-12 16:52:57,934 ERROR [B.defaultRpcServer.handler=5,queue=1,port=60000] master.MasterRpcServices: Region server plat-ecloud01-bigdata-datanode01,16020,1531385402185 reported a fatal error:
ABORTING region server plat-ecloud01-bigdata-datanode01,16020,1531385402185: Caught throwable while processing event RS_LOG_REPLAY
Cause:
java.lang.IllegalStateException: Reached EOF unexpectedly.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at alluxio.client.file.FileInStream.readCurrentBlockToPos(FileInStream.java:746)
        at alluxio.client.file.FileInStream.readCurrentBlockToEnd(FileInStream.java:755)
        at alluxio.client.file.FileInStream.close(FileInStream.java:160)
        at alluxio.hadoop.HdfsFileInputStream.close(HdfsFileInputStream.java:85)
        at java.io.FilterInputStream.close(FilterInputStream.java:181)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.close(ProtobufLogReader.java:144)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:402)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:236)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
        at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

hbase-site.xml about alluxio:
<property>
  <name>alluxio.zookeeper.enabled</name>
  <value>true</value>
</property>

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>

<property>
  <name>alluxio.zookeeper.address</name>
  <value>plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181</value>
</property>

<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
</property>

<property>
  <name>hbase.rootdir</name>
  <value>alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase</value>
</property>

<property>
<name>alluxio.user.file.writetype.default</name>
<value>CACHE_THROUGH</value>
</property>


<property>
<name>alluxio.user.network.netty.timeout</name>
<value>60000</value>
</property>

<property>
<name>alluxio.user.file.metadata.load.type</name>
<value>Always</value>
</property>

<property>
<name>alluxio.user.block.worker.client.read.retry</name>
<value>10</value>
</property>


<property>
<name>alluxio.user.file.delete.unchecked</name>
<value>true</value>
</property>


the alluxio-site.properties
alluxio.master.hostname=10.176.141.22
alluxio.underfs.address=hdfs://dev-cluster/underFSStorage

# Security properties
# alluxio.security.authorization.permission.enabled=true
# alluxio.security.authentication.type=SIMPLE

# Worker properties
alluxio.worker.memory.size=10GB
# alluxio.worker.tieredstore.levels=1
# alluxio.worker.tieredstore.level0.alias=MEM
alluxio.worker.tieredstore.level0.dirs.path=/opt/mnt/ramdisk

# User properties
# alluxio.user.file.readtype.default=CACHE_PROMOTE
# alluxio.user.file.writetype.default=MUST_CACHE

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181
alluxio.master.journal.folder=hdfs://dev-cluster/user/apache/alluxio/journal

alluxio.master.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.master.principal=apache/[hidden email]
alluxio.worker.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.worker.principal=apache/[hidden email]
alluxio.worker.block.heartbeat.timeout.ms=60000
alluxio.network.netty.heartbeat.timeout=60000
alluxio.master.worker.threads.max=4096


any help would highly appreciated.


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: HBase with Alluxio failed,please help!

倪项菲
and the log from alluxio master :
2018-07-12 16:39:16,378 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:39:17,542 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/default/TestTable/df9cb4d1f8f2ad2cded34e34c6c457d5/recovere
d.edits/0000000000000005043.temp
2018-07-12 16:39:23,326 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:39:23,328 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:39:24,639 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/default/TestTable/df9cb4d1f8f2ad2cded34e34c6c457d5/recovere
d.edits/0000000000000005043.temp
2018-07-12 16:40:16,373 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/hbase/meta/1588230740/recovered.edits/130.seqid
2018-07-12 16:41:24,681 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:24,681 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:24,885 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:24,885 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:25,109 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:25,109 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:25,412 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory

在 2018年7月12日星期四 UTC+8下午5:10:01,倪项菲写道:
Hi Expert,
     I am using HBase 1.2.6 and Alluxio 1.6.0,the hbase regionserver went down one by one,there is no alive regionserver at last,it returned error when spliting logs,here is the log from hmaster:

WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 4 but only 2 done
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 4 done = 2 error = 2
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:50,580 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617 entered state: DONE plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:50,594 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617
2018-07-12 16:52:50,594 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 does not exist
2018-07-12 16:52:50,594 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617
2018-07-12 16:52:50,600 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,602 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 3 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,625 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 acquired by plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:50,661 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:50,666 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:50,666 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:50,673 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 entered state: ERR plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,673 WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:53,423 INFO  [plat-ecloud01-bigdata-journalnode01,60000,1531384672092_splitLogManager__ChoreService_1] master.SplitLogManager: total tasks = 2 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951=last_update = 1531385570479 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode04,16020,1531384985479 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 2 done = 1 error = 0, /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382=last_update = 1531385570656 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode09,16020,1531384792828 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 3 done = 1 error = 1}
2018-07-12 16:52:55,531 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 entered state: DONE plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:55,544 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 does not exist
2018-07-12 16:52:55,544 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 3 but only 2 done
2018-07-12 16:52:55,545 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 3 done = 2 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:55,648 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,650 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 2 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,707 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,717 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:55,718 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:55,718 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:55,718 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:56,213 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951 entered state: DONE plat-ecloud01-bigdata-datanode04,16020,1531384985479
2018-07-12 16:52:56,226 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951
2018-07-12 16:52:56,227 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 does not exist
2018-07-12 16:52:57,934 ERROR [B.defaultRpcServer.handler=5,queue=1,port=60000] master.MasterRpcServices: Region server plat-ecloud01-bigdata-datanode01,16020,1531385402185 reported a fatal error:
ABORTING region server plat-ecloud01-bigdata-datanode01,16020,1531385402185: Caught throwable while processing event RS_LOG_REPLAY
Cause:
java.lang.IllegalStateException: Reached EOF unexpectedly.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at alluxio.client.file.FileInStream.readCurrentBlockToPos(FileInStream.java:746)
        at alluxio.client.file.FileInStream.readCurrentBlockToEnd(FileInStream.java:755)
        at alluxio.client.file.FileInStream.close(FileInStream.java:160)
        at alluxio.hadoop.HdfsFileInputStream.close(HdfsFileInputStream.java:85)
        at java.io.FilterInputStream.close(FilterInputStream.java:181)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.close(ProtobufLogReader.java:144)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:402)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:236)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
        at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

hbase-site.xml about alluxio:
<property>
  <name>alluxio.zookeeper.enabled</name>
  <value>true</value>
</property>

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>

<property>
  <name>alluxio.zookeeper.address</name>
  <value>plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181</value>
</property>

<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
</property>

<property>
  <name>hbase.rootdir</name>
  <value>alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase</value>
</property>

<property>
<name>alluxio.user.file.writetype.default</name>
<value>CACHE_THROUGH</value>
</property>


<property>
<name>alluxio.user.network.netty.timeout</name>
<value>60000</value>
</property>

<property>
<name>alluxio.user.file.metadata.load.type</name>
<value>Always</value>
</property>

<property>
<name>alluxio.user.block.worker.client.read.retry</name>
<value>10</value>
</property>


<property>
<name>alluxio.user.file.delete.unchecked</name>
<value>true</value>
</property>


the alluxio-site.properties
alluxio.master.hostname=10.176.141.22
alluxio.underfs.address=hdfs://dev-cluster/underFSStorage

# Security properties
# alluxio.security.authorization.permission.enabled=true
# alluxio.security.authentication.type=SIMPLE

# Worker properties
alluxio.worker.memory.size=10GB
# alluxio.worker.tieredstore.levels=1
# alluxio.worker.tieredstore.level0.alias=MEM
alluxio.worker.tieredstore.level0.dirs.path=/opt/mnt/ramdisk

# User properties
# alluxio.user.file.readtype.default=CACHE_PROMOTE
# alluxio.user.file.writetype.default=MUST_CACHE

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181
alluxio.master.journal.folder=hdfs://dev-cluster/user/apache/alluxio/journal

alluxio.master.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.master.principal=apache/_[hidden email]
alluxio.worker.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.worker.principal=apache/_[hidden email]
<a href="http://alluxio.worker.block.heartbeat.timeout.ms" target="_blank" rel="nofollow" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Falluxio.worker.block.heartbeat.timeout.ms\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGPnNeLFF6jFqLeMJr8283QvD4z0Q&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Falluxio.worker.block.heartbeat.timeout.ms\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGPnNeLFF6jFqLeMJr8283QvD4z0Q&#39;;return true;">alluxio.worker.block.heartbeat.timeout.ms=60000
alluxio.network.netty.heartbeat.timeout=60000
alluxio.master.worker.threads.max=4096


any help would highly appreciated.


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: HBase with Alluxio failed,please help!

madan
Hey Alluxio User! 
   It looks like this may be due to Hbase being unable to write its WALs possible because of a conflict with some existing WALs that are still open.Can you try deleting the /WALs directory(via Alluxio) and then start up Hbase again? (You will need likely need to start all services again as the Hbase Master will most likely go down shortly after the region servers do). 


Thanks,
Madan

On Thursday, July 12, 2018 at 2:15:58 AM UTC-7, 倪项菲 wrote:
and the log from alluxio master :
2018-07-12 16:39:16,378 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:39:17,542 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/default/TestTable/df9cb4d1f8f2ad2cded34e34c6c457d5/recovere
d.edits/0000000000000005043.temp
2018-07-12 16:39:23,326 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:39:23,328 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:39:24,639 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/default/TestTable/df9cb4d1f8f2ad2cded34e34c6c457d5/recovere
d.edits/0000000000000005043.temp
2018-07-12 16:40:16,373 WARN  SafeUfsDeleter - The file to delete does not exist in ufs: hdfs://dev-cluster/underFSStorage/hbase/data/hbase/meta/1588230740/recovered.edits/130.seqid
2018-07-12 16:41:24,681 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:24,681 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:24,885 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:24,885 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:25,109 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory
2018-07-12 16:41:25,109 WARN  FileSystemMasterClientServiceHandler - Rename: srcPath=/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%
2C16020%2C1531380493700.null0.1531383292894, dstPath=/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894, options=RenameTOptions(), Error=Failed to renam
e hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdf
s://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 in the under file system
2018-07-12 16:41:25,412 WARN  HdfsUnderFileSystem - Unable to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode07,16020,1531380493700-splitting/plat-ecloud01-bi
gdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode07%2C16020%2C1531380493700.null0.1531383292894 beca
use source does not exist or is a directory

在 2018年7月12日星期四 UTC+8下午5:10:01,倪项菲写道:
Hi Expert,
     I am using HBase 1.2.6 and Alluxio 1.6.0,the hbase regionserver went down one by one,there is no alive regionserver at last,it returned error when spliting logs,here is the log from hmaster:

WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 4 but only 2 done
2018-07-12 16:52:50,498 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 4 done = 2 error = 2
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:50,580 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617 entered state: DONE plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:50,594 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617
2018-07-12 16:52:50,594 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null0.1531384969617 does not exist
2018-07-12 16:52:50,594 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null0.1531384969617
2018-07-12 16:52:50,600 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,602 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 3 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,617 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,625 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 acquired by plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:50,661 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:50,666 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:50,666 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:50,666 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:50,673 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 entered state: ERR plat-ecloud01-bigdata-datanode05,16020,1531384763524
2018-07-12 16:52:50,673 WARN  [main-EventThread] coordination.SplitLogManagerCoordination: Error splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301
2018-07-12 16:52:53,423 INFO  [plat-ecloud01-bigdata-journalnode01,60000,1531384672092_splitLogManager__ChoreService_1] master.SplitLogManager: total tasks = 2 unassigned = 0 tasks={/hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951=last_update = 1531385570479 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode04,16020,1531384985479 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 2 done = 1 error = 0, /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382=last_update = 1531385570656 last_version = 2 cur_worker_name = plat-ecloud01-bigdata-datanode09,16020,1531384792828 status = in_progress incarnation = 0 resubmits = 0 batch = installed = 3 done = 1 error = 1}
2018-07-12 16:52:55,531 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382 entered state: DONE plat-ecloud01-bigdata-datanode09,16020,1531384792828
2018-07-12 16:52:55,544 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385543382 does not exist
2018-07-12 16:52:55,544 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385543382
2018-07-12 16:52:55,544 WARN  [ProcedureExecutor-1] master.SplitLogManager: error while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] installed = 3 but only 2 done
2018-07-12 16:52:55,545 WARN  [ProcedureExecutor-1] procedure.ServerCrashProcedure: Failed serverName=plat-ecloud01-bigdata-datanode10,16020,1531385011185, state=SERVER_CRASH_SPLIT_LOGS; retry
java.io.IOException: error or interrupted while splitting logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] Task = installed = 3 done = 2 error = 1
        at org.apache.hadoop.hbase.master.SplitLogManager.splitLogDistributed(SplitLogManager.java:290)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:403)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:376)
        at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:293)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.splitLogs(ServerCrashProcedure.java:438)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:251)
        at org.apache.hadoop.hbase.master.procedure.ServerCrashProcedure.executeFromState(ServerCrashProcedure.java:73)
        at org.apache.hadoop.hbase.procedure2.StateMachineProcedure.execute(StateMachineProcedure.java:119)
        at org.apache.hadoop.hbase.procedure2.Procedure.doExecute(Procedure.java:498)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execProcedure(ProcedureExecutor.java:1147)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:942)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.execLoop(ProcedureExecutor.java:895)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor.access$400(ProcedureExecutor.java:77)
        at org.apache.hadoop.hbase.procedure2.ProcedureExecutor$2.run(ProcedureExecutor.java:497)
2018-07-12 16:52:55,648 INFO  [ProcedureExecutor-1] master.SplitLogManager: dead splitlog workers [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,650 INFO  [ProcedureExecutor-1] master.SplitLogManager: Started splitting 2 logs in [alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting] for [plat-ecloud01-bigdata-datanode10,16020,1531385011185]
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385548301 acquired by plat-ecloud01-bigdata-datanode01,16020,1531385402185
2018-07-12 16:52:55,665 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 acquired by plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,707 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824 entered state: DONE plat-ecloud01-bigdata-datanode06,16020,1531384771665
2018-07-12 16:52:55,717 WARN  [main-EventThread] hadoop.AbstractFileSystem: rename failed: Failed to rename hdfs://dev-cluster/underFSStorage/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to hdfs://dev-cluster/underFSStorage/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 in the under file system
2018-07-12 16:52:55,718 WARN  [main-EventThread] wal.WALSplitter: Unable to move  alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode10,16020,1531385011185-splitting/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824
2018-07-12 16:52:55,718 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185.null1.1531385535824 does not exist
2018-07-12 16:52:55,718 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: Done splitting /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode10%2C16020%2C1531385011185-splitting%2Fplat-ecloud01-bigdata-datanode10%252C16020%252C1531385011185.null1.1531385535824
2018-07-12 16:52:56,213 INFO  [main-EventThread] coordination.SplitLogManagerCoordination: task /hbase/splitWAL/WALs%2Fplat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458-splitting%2Fplat-ecloud01-bigdata-datanode03%252C16020%252C1531384966458.null1.1531384969951 entered state: DONE plat-ecloud01-bigdata-datanode04,16020,1531384985479
2018-07-12 16:52:56,226 INFO  [main-EventThread] wal.WALSplitter: Archived processed log alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/WALs/plat-ecloud01-bigdata-datanode03,16020,1531384966458-splitting/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 to alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase/oldWALs/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951
2018-07-12 16:52:56,227 WARN  [main-EventThread] hadoop.AbstractFileSystem: delete failed: Path /hbase/splitWAL/plat-ecloud01-bigdata-datanode03%2C16020%2C1531384966458.null1.1531384969951 does not exist
2018-07-12 16:52:57,934 ERROR [B.defaultRpcServer.handler=5,queue=1,port=60000] master.MasterRpcServices: Region server plat-ecloud01-bigdata-datanode01,16020,1531385402185 reported a fatal error:
ABORTING region server plat-ecloud01-bigdata-datanode01,16020,1531385402185: Caught throwable while processing event RS_LOG_REPLAY
Cause:
java.lang.IllegalStateException: Reached EOF unexpectedly.
        at com.google.common.base.Preconditions.checkState(Preconditions.java:149)
        at alluxio.client.file.FileInStream.readCurrentBlockToPos(FileInStream.java:746)
        at alluxio.client.file.FileInStream.readCurrentBlockToEnd(FileInStream.java:755)
        at alluxio.client.file.FileInStream.close(FileInStream.java:160)
        at alluxio.hadoop.HdfsFileInputStream.close(HdfsFileInputStream.java:85)
        at java.io.FilterInputStream.close(FilterInputStream.java:181)
        at org.apache.hadoop.hbase.regionserver.wal.ProtobufLogReader.close(ProtobufLogReader.java:144)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:402)
        at org.apache.hadoop.hbase.wal.WALSplitter.splitLogFile(WALSplitter.java:236)
        at org.apache.hadoop.hbase.regionserver.SplitLogWorker$1.exec(SplitLogWorker.java:104)
        at org.apache.hadoop.hbase.regionserver.handler.WALSplitterHandler.process(WALSplitterHandler.java:72)
        at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:129)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

hbase-site.xml about alluxio:
<property>
  <name>alluxio.zookeeper.enabled</name>
  <value>true</value>
</property>

<property>
  <name>fs.alluxio.impl</name>
  <value>alluxio.hadoop.FileSystem</value>
</property>

<property>
  <name>alluxio.zookeeper.address</name>
  <value>plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181</value>
</property>

<property>
  <name>fs.AbstractFileSystem.alluxio.impl</name>
  <value>alluxio.hadoop.AlluxioFileSystem</value>
</property>

<property>
  <name>hbase.rootdir</name>
  <value>alluxio://plat-ecloud01-bigdata-journalnode03:19998/hbase</value>
</property>

<property>
<name>alluxio.user.file.writetype.default</name>
<value>CACHE_THROUGH</value>
</property>


<property>
<name>alluxio.user.network.netty.timeout</name>
<value>60000</value>
</property>

<property>
<name>alluxio.user.file.metadata.load.type</name>
<value>Always</value>
</property>

<property>
<name>alluxio.user.block.worker.client.read.retry</name>
<value>10</value>
</property>


<property>
<name>alluxio.user.file.delete.unchecked</name>
<value>true</value>
</property>


the alluxio-site.properties
alluxio.master.hostname=10.176.141.22
alluxio.underfs.address=hdfs://dev-cluster/underFSStorage

# Security properties
# alluxio.security.authorization.permission.enabled=true
# alluxio.security.authentication.type=SIMPLE

# Worker properties
alluxio.worker.memory.size=10GB
# alluxio.worker.tieredstore.levels=1
# alluxio.worker.tieredstore.level0.alias=MEM
alluxio.worker.tieredstore.level0.dirs.path=/opt/mnt/ramdisk

# User properties
# alluxio.user.file.readtype.default=CACHE_PROMOTE
# alluxio.user.file.writetype.default=MUST_CACHE

alluxio.zookeeper.enabled=true
alluxio.zookeeper.address=plat-ecloud01-bigdata-zk01:2181,plat-ecloud01-bigdata-zk02:2181,plat-ecloud01-bigdata-zk03:2181
alluxio.master.journal.folder=hdfs://dev-cluster/user/apache/alluxio/journal

alluxio.master.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.master.principal=apache/_<a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="P5RkPvxLBQAJ" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">HO...@...
alluxio.worker.keytab.file=/opt/hadoop-2.7.6/etc/hadoop/bigdata.keytab
alluxio.worker.principal=apache/_<a href="javascript:" rel="nofollow" target="_blank" gdf-obfuscated-mailto="P5RkPvxLBQAJ" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">HO...@...
<a href="http://alluxio.worker.block.heartbeat.timeout.ms" rel="nofollow" target="_blank" onmousedown="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Falluxio.worker.block.heartbeat.timeout.ms\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGPnNeLFF6jFqLeMJr8283QvD4z0Q&#39;;return true;" onclick="this.href=&#39;http://www.google.com/url?q\x3dhttp%3A%2F%2Falluxio.worker.block.heartbeat.timeout.ms\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGPnNeLFF6jFqLeMJr8283QvD4z0Q&#39;;return true;">alluxio.worker.block.heartbeat.timeout.ms=60000
alluxio.network.netty.heartbeat.timeout=60000
alluxio.master.worker.threads.max=4096


any help would highly appreciated.


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.