Alluxio master register itself as a worker

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Alluxio master register itself as a worker

Kun Li
I'm running Alluxio 1.8.0 with spark 2.3.1, in a kubernetes cluster.

Master is running as a statefulset , so it has the domain name of alluxio-master-0.alluxio-master.cdev.svc.cluster.local.

This is the alluxio pods that I have:
```
$ kubectl get po -ncdev
NAME                             READY     STATUS    RESTARTS   AGE
alluxio-master-0                 2/2       Running   0          6m
alluxio-worker-121418788-5fsfk   2/2       Running   0          17h
alluxio-worker-121418788-m906g   2/2       Running   0          17h
alluxio-worker-121418788-zdsrf   2/2       Running   0          17h
...
```

And the following shows in the logs of alluxio-master-0:

```
2018-09-14 12:05:51,558 INFO  WebServer - Alluxio Master Web service started @ /0.0.0.0:19999
2018-09-14 12:05:51,563 INFO  AlluxioMasterProcess - Alluxio master version 1.8.0 started (gained leadership). bindHost=/0.0.0.0:19998, connectHost=ac04.rinc.com/10.1.1.27:19998, rpcPort=19998, webPort=19999
2018-09-14 12:05:51,598 INFO  DefaultSafeModeManager - Rpc server started, waiting 5000ms for workers to register
2018-09-14 12:05:51,600 INFO  FaultTolerantAlluxioMasterProcess - Primary started
2018-09-14 12:05:51,849 WARN  DefaultBlockMaster - Could not find worker id: 6064340727611749855 for heartbeat.
2018-09-14 12:05:51,871 INFO  DefaultBlockMaster - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=alluxio-master-0.alluxio-master.cdev.svc.cluster.local, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=alluxio-master-0.alluxio-master.cdev.svc.cluster.local, rack=null)} id: 7545956817790988295
2018-09-14 12:05:51,887 INFO  DefaultBlockMaster - registerWorker(): MasterWorkerInfo{id=7545956817790988295, workerAddress=WorkerNetAddress{host=alluxio-master-0.alluxio-master.cdev.svc.cluster.local, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=alluxio-master-0.alluxio-master.cdev.svc.cluster.local, rack=null)}, capacityBytes=37580963840, usedBytes=0, lastUpdatedTimeMs=1536897951886, blocks=[]}
2018-09-14 12:05:53,057 WARN  DefaultBlockMaster - Could not find worker id: 5141368466075040629 for heartbeat.
2018-09-14 12:05:53,058 INFO  DefaultBlockMaster - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=ac04.rinc.com, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=ac04.rinc.com, rack=null)} id: 8231393212956172642
2018-09-14 12:05:53,061 WARN  DefaultBlockMaster - Invalid block: 151095607296 from worker ac04.rinc.com.
2018-09-14 12:05:53,061 WARN  DefaultBlockMaster - Invalid block: 152051908608 from worker ac04.rinc.com.
2018-09-14 12:05:53,061 INFO  DefaultBlockMaster - Requesting delete for orphaned block: 151095607296 from worker ac04.rinc.com.
2018-09-14 12:05:53,061 INFO  DefaultBlockMaster - Requesting delete for orphaned block: 152051908608 from worker ac04.rinc.com.
2018-09-14 12:05:53,061 INFO  DefaultBlockMaster - registerWorker(): MasterWorkerInfo{id=8231393212956172642, workerAddress=WorkerNetAddress{host=ac04.rinc.com, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=ac04.rinc.com, rack=null)}, capacityBytes=37580963840, usedBytes=21308605, lastUpdatedTimeMs=1536897953061, blocks=[151095607296, 152051908608]}
2018-09-14 12:05:53,848 WARN  DefaultBlockMaster - Could not find worker id: 5786180206808424635 for heartbeat.
2018-09-14 12:05:53,849 INFO  DefaultBlockMaster - getWorkerId(): WorkerNetAddress: WorkerNetAddress{host=aa04.rinc.com, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=aa04.rinc.com, rack=null)} id: 3677615251792051127
2018-09-14 12:05:53,853 WARN  DefaultBlockMaster - Invalid block: 151280156672 from worker aa04.rinc.com.
2018-09-14 12:05:53,853 WARN  DefaultBlockMaster - Invalid block: 151112384512 from worker aa04.rinc.com.
2018-09-14 12:05:53,853 INFO  DefaultBlockMaster - Requesting delete for orphaned block: 151280156672 from worker aa04.rinc.com.
2018-09-14 12:05:53,853 INFO  DefaultBlockMaster - Requesting delete for orphaned block: 151112384512 from worker aa04.rinc.com.
2018-09-14 12:05:53,853 INFO  DefaultBlockMaster - registerWorker(): MasterWorkerInfo{id=3677615251792051127, workerAddress=WorkerNetAddress{host=aa04.rinc.com, rpcPort=29998, dataPort=29999, webPort=29996, domainSocketPath=, tieredIdentity=TieredIdentity(node=aa04.rinc.com, rack=null)}, capacityBytes=37580963840, usedBytes=12492046, lastUpdatedTimeMs=1536897953853, blocks=[151280156672, 151112384512]}
```
 
So I have three workers at this time, and write to alluxio failed:
aa04.rinc.com
ac04.rinc.com
alluxio-master-0.alluxio-master.cdev.svc.cluster.local

Anyone ever saw this before ?

likun

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.