It takes too long to restore state after Master crash. Is there any method to recover more quickly

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

It takes too long to restore state after Master crash. Is there any method to recover more quickly

Xiao Bang
Hi,
     My journal log size will continue to increase, about 2G per day
     It will take a long time, more than 2 hours(log size 100g), for Alluxio Master to read the log recovery state after the restart.
QQ图片20181107184517.png
     
     my Alluxio version is 1.6.1
     In alluxio-site.properties, all  journal log-related configuration items are the default values


    Can I improve the recovery speed by modifying some configurations?

Best
Bang Xiao

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: It takes too long to restore state after Master crash. Is there any method to recover more quickly

Andrew Audibert
Hi Bang,

You can run backup masters so that when the leading master crashes, a backup master can quickly take its place. Backup masters will keep up to date with the journal. See https://www.alluxio.org/docs/1.6/en/Running-Alluxio-Fault-Tolerant.html for more details.

Another option is to take more frequent master checkpoints. By default a checkpoint is taken every 2 million journal entries, but this can be controlled by the property <a href="alluxio.master.journal.checkpoint.period.entrieshttps://www.alluxio.org/docs/1.8/en/reference/Properties-List.html#alluxio.master.journal.checkpoint.period.entries">alluxio.master.journal.checkpoint.period.entries. Checkpoints are created by backup masters.

Best,
Andrew

On Wed, Nov 7, 2018 at 4:15 AM xb chopin <[hidden email]> wrote:
Hi,
     My journal log size will continue to increase, about 2G per day
     It will take a long time, more than 2 hours(log size 100g), for Alluxio Master to read the log recovery state after the restart.
QQ图片20181107184517.png
     
     my Alluxio version is 1.6.1
     In alluxio-site.properties, all  journal log-related configuration items are the default values


    Can I improve the recovery speed by modifying some configurations?

Best
Bang Xiao

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: It takes too long to restore state after Master crash. Is there any method to recover more quickly

Xiao Bang
Hi Andrew,
    Thanks a lot for the tip!  I used  backup masters  before, but I have encountered some problems. For example, it will take a long time for the backup master  to recover after the leader is crash. 
      
   I will reuse this Fault-Tolerant method and reduce the value of alluxio.master.journal.checkpoint.period.entries.  See if it works.

Best
Bang Xiao


Andrew Audibert <[hidden email]> 于2018年11月7日周三 下午11:10写道:
Hi Bang,

You can run backup masters so that when the leading master crashes, a backup master can quickly take its place. Backup masters will keep up to date with the journal. See https://www.alluxio.org/docs/1.6/en/Running-Alluxio-Fault-Tolerant.html for more details.

Another option is to take more frequent master checkpoints. By default a checkpoint is taken every 2 million journal entries, but this can be controlled by the property alluxio.master.journal.checkpoint.period.entries. Checkpoints are created by backup masters.

Best,
Andrew

On Wed, Nov 7, 2018 at 4:15 AM xb chopin <[hidden email]> wrote:
Hi,
     My journal log size will continue to increase, about 2G per day
     It will take a long time, more than 2 hours(log size 100g), for Alluxio Master to read the log recovery state after the restart.
QQ图片20181107184517.png
     
     my Alluxio version is 1.6.1
     In alluxio-site.properties, all  journal log-related configuration items are the default values


    Can I improve the recovery speed by modifying some configurations?

Best
Bang Xiao

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: It takes too long to restore state after Master crash. Is there any method to recover more quickly

Andrew Audibert
Unless the backup master was just started, it shouldn't take more than a minute for the backup master to take over and serve requests after the original leading master goes down. Let us know if you see it taking longer than that.

On Thu, Nov 8, 2018 at 1:34 AM xb chopin <[hidden email]> wrote:
Hi Andrew,
    Thanks a lot for the tip!  I used  backup masters  before, but I have encountered some problems. For example, it will take a long time for the backup master  to recover after the leader is crash. 
      
   I will reuse this Fault-Tolerant method and reduce the value of alluxio.master.journal.checkpoint.period.entries.  See if it works.

Best
Bang Xiao


Andrew Audibert <[hidden email]> 于2018年11月7日周三 下午11:10写道:
Hi Bang,

You can run backup masters so that when the leading master crashes, a backup master can quickly take its place. Backup masters will keep up to date with the journal. See https://www.alluxio.org/docs/1.6/en/Running-Alluxio-Fault-Tolerant.html for more details.

Another option is to take more frequent master checkpoints. By default a checkpoint is taken every 2 million journal entries, but this can be controlled by the property alluxio.master.journal.checkpoint.period.entries. Checkpoints are created by backup masters.

Best,
Andrew

On Wed, Nov 7, 2018 at 4:15 AM xb chopin <[hidden email]> wrote:
Hi,
     My journal log size will continue to increase, about 2G per day
     It will take a long time, more than 2 hours(log size 100g), for Alluxio Master to read the log recovery state after the restart.
QQ图片20181107184517.png
     
     my Alluxio version is 1.6.1
     In alluxio-site.properties, all  journal log-related configuration items are the default values


    Can I improve the recovery speed by modifying some configurations?

Best
Bang Xiao

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--
--

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.