Question about using multiple Alluxio clusters pointing to same UFS bucket

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about using multiple Alluxio clusters pointing to same UFS bucket

伍鑫
Hi Masters,

I got a question about using multiple Alluxio cluster pointing to same UFS  OSS bucket. 

Out product is a MPP database, storing data on Object Storage Service. Currently we leverage Alluxio as the caching layer to accelerate OSS I/O speed, MPP segment and Alluxio worker are tightly deployed to use local short-read feature. For one copy of data on an OSS bucket, there will be multiple MPP/Alluxio(e.g. Cluster A & Cluster B) clusters pointing to it, and both clusters will write new object into this bucket.

[question 1] If cluster A write some new objects, how can cluster B recognize them in-time? We observed some error that cluster B does not know these new UFS objects because meta is not updated. And then if cluster B read those objects, cluster B will create new zero size object in-mem. Then we got an inconsistent meta state of cluster B, and it's hard to recover.

Manually issue "alluxio fs ls -f " might overcame this problem if we know cluster A wrote something, but the thing is we never know when cluster A writes...[question 2] So is there any Alluxio mechanism/configuration to list UFS object update in-time? At least, before reading the file, alluxio should check if the object already exists on UFS?

We extended the UFS interface to support some more OSS service in China. [question 2] Maybe the error is caused by our UFS implementation? 

My Env
Alluxio version is 1.5.0
Computation framework is a MPP database
UFS are some Chinese public cloud provider, we extended the UFS interface to support their OSS service(qingstor/tencent/Kingsoft. etc...)

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Question about using multiple Alluxio clusters pointing to same UFS bucket

Bin Fan
Hi Wuxin,

Put my replies inline.

On Tue, Jul 17, 2018 at 1:29 AM 伍鑫 <[hidden email]> wrote:
Hi Masters,

I got a question about using multiple Alluxio cluster pointing to same UFS  OSS bucket. 

Out product is a MPP database, storing data on Object Storage Service. Currently we leverage Alluxio as the caching layer to accelerate OSS I/O speed, MPP segment and Alluxio worker are tightly deployed to use local short-read feature. For one copy of data on an OSS bucket, there will be multiple MPP/Alluxio(e.g. Cluster A & Cluster B) clusters pointing to it, and both clusters will write new object into this bucket.

[question 1] If cluster A write some new objects, how can cluster B recognize them in-time? We observed some error that cluster B does not know these new UFS objects because meta is not updated. And then if cluster B read those objects, cluster B will create new zero size object in-mem. Then we got an inconsistent meta state of cluster B, and it's hard to recover.

Manually issue "alluxio fs ls -f " might overcame this problem if we know cluster A wrote something, but the thing is we never know when cluster A writes... 

In Alluxio 1.5, you can set `alluxio.user.file.metadata.load.type=Always` to force the other cluster (e.g., Cluster B) to discover newly inserted files in UFS.
Note that, this is a client side configuration, meaning it is triggered by the reads of your MPP database.
So you should follow how to set Alluxio client-side configuration, 
checkout 
or 

Since Alluxio 1.7,  this setting `alluxio.user.file.metadata.load.type=Always` is deprecated by  `alluxio.user.file.metadata.sync.interval=0`
for the reason if you are interested.
 
[question 2] So is there any Alluxio mechanism/configuration to list UFS object update in-time? At least, before reading the file, alluxio should check if the object already exists on UFS?
I think the setting `alluxio.user.file.metadata.load.type=Always` should work?
 

We extended the UFS interface to support some more OSS service in China. 
 
[question 2] Maybe the error is caused by our UFS implementation? 

My Env
Alluxio version is 1.5.0
Computation framework is a MPP database
UFS are some Chinese public cloud provider, we extended the UFS interface to support their OSS service(qingstor/tencent/Kingsoft. etc...)

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Question about using multiple Alluxio clusters pointing to same UFS bucket

伍鑫
Hi Bin,

Thanks so much for your reply! I just modified our client logic as you suggested. Will update if we solved the problem.

Thanks so much,
Eric

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.