A thread is possible to be stuck when MEM is full

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

A thread is possible to be stuck when MEM is full

Sean Shi
Hi,

I post this to discuss the issue I encounter and to find a better suggestion
Image this situation that the system is very busy and data size is more huge than the capacity of MEM.So a thread read block from HDD must move some blocks out from MEM before promoting it to the MEM.
For example, the  thread_A wants read block A and it must move a block B out firstly to release some space.The thread_A must get a write lock before it can
move the block B out.In this case, if other threads get a read lock before the thread_A to get write lock, the thread_A will wait until other thread release all read locks.However the system is busy, there always will be some threads to get a read lock on it and the lock is unfair.So these thread will get read lock immediately and  the thread_A will starve.   

the related code 

 
 private MoveBlockResult moveBlockInternal(long sessionId, long blockId,
     
BlockStoreLocation oldLocation, BlockStoreLocation newLocation)
         
throws BlockDoesNotExistException, BlockAlreadyExistsException,
         
InvalidWorkerStateException, IOException {
   
long lockId = mLockManager.lockBlock(sessionId, blockId, BlockLockType.WRITE);

  
This code is pasted from TieredBlockStore#moveBlockInternal. this line
mLockManager.lockBlock(sessionId, blockId, BlockLockType.WRITE)

will be stuck because it will never get a write lock.

related pr 
https://github.com/Alluxio/alluxio/pull/7296

- Sean

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: A thread is possible to be stuck when MEM is full

Bin Fan
Hi Sean

I think the lock starving you described here is possible.
We actually realized the corner cases can be complicated with sync evictors.
Here are two suggestions:

(1) if you are on Alluxio 1.7 or later, we have make the async evictor the default one to solve this issue.
(alluxio.worker.tieredstore.reserver.enabled=true)
With the async evcitor, moving data down to other tiers can be triggered earlier.

(2) Alternatively,  in the latest doc on Alluxio storage,
we actually recommend using a single tier (doc) instead of multi-tier setup.
e.g., you can still specify multiple data directories to this single tier like

alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk,/mnt/ssd1,/mnt/ssd2

In this way, there will be no data move among dirs in the same tier.

Let me know if either suggestion works for you.

- Bin

On Monday, October 29, 2018 at 12:13:45 AM UTC-7, Sean Shi wrote:
Hi,

I post this to discuss the issue I encounter and to find a better suggestion
Image this situation that the system is very busy and data size is more huge than the capacity of MEM.So a thread read block from HDD must move some blocks out from MEM before promoting it to the MEM.
For example, the  thread_A wants read block A and it must move a block B out firstly to release some space.The thread_A must get a write lock before it can
move the block B out.In this case, if other threads get a read lock before the thread_A to get write lock, the thread_A will wait until other thread release all read locks.However the system is busy, there always will be some threads to get a read lock on it and the lock is unfair.So these thread will get read lock immediately and  the thread_A will starve.   

the related code 

 
 private MoveBlockResult moveBlockInternal(long sessionId, long blockId,
     
BlockStoreLocation oldLocation, BlockStoreLocation newLocation)
         
throws BlockDoesNotExistException, BlockAlreadyExistsException,
         
InvalidWorkerStateException, IOException {
   
long lockId = mLockManager.lockBlock(sessionId, blockId, BlockLockType.WRITE);

  
This code is pasted from TieredBlockStore#moveBlockInternal. this line
mLockManager.lockBlock(sessionId, blockId, BlockLockType.WRITE)

will be stuck because it will never get a write lock.

related pr 
<a href="https://github.com/Alluxio/alluxio/pull/7296" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fpull%2F7296\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEPYAOLFQvxUu2fEd7Un5OEdz1JYw&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fgithub.com%2FAlluxio%2Falluxio%2Fpull%2F7296\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNEPYAOLFQvxUu2fEd7Un5OEdz1JYw&#39;;return true;">https://github.com/Alluxio/alluxio/pull/7296

- Sean

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.