Alluxio Cache control while read and write

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Alluxio Cache control while read and write

Jais Sebastian
Hi,

Can I control the write options CACHE_THROUGH/THROUGH for Write scenario and  SKIP CACHE option while the read operation in Spark Dataset API based on the path? Let's say I have /cachedpath and /nocachepath , whenever I write into  /cachedpath  - file should be persisted through Alluxio cache. Other cases when I use /nocachepath, the file will not get cached. This I want to control in the same Spark context. Let me know if this approach is possible 

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio Cache control while read and write

abellina
Could you use the aluxio:// path for those things that are cacheable, and use the direct path (hdfs://) to your underfs otherwise?

On Wednesday, July 11, 2018 at 11:12:37 AM UTC-5, Jais Sebastian wrote:
Hi,

Can I control the write options CACHE_THROUGH/THROUGH for Write scenario and  SKIP CACHE option while the read operation in Spark Dataset API based on the path? Let's say I have /cachedpath and /nocachepath , whenever I write into  /cachedpath  - file should be persisted through Alluxio cache. Other cases when I use /nocachepath, the file will not get cached. This I want to control in the same Spark context. Let me know if this approach is possible 

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio Cache control while read and write

Bin Fan
In reply to this post by Jais Sebastian
Hi Jais,

here is perhaps one hacky workaround by using `alluxio.master.whitelist` which indicates a set of Alluxio paths cachable.

search alluxio.master.whitelist in 


On Wed, Jul 11, 2018 at 9:12 AM Jais Sebastian <[hidden email]> wrote:
Hi,

Can I control the write options CACHE_THROUGH/THROUGH for Write scenario and  SKIP CACHE option while the read operation in Spark Dataset API based on the path? Let's say I have /cachedpath and /nocachepath , whenever I write into  /cachedpath  - file should be persisted through Alluxio cache. Other cases when I use /nocachepath, the file will not get cached. This I want to control in the same Spark context. Let me know if this approach is possible 

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio Cache control while read and write

Jais Sebastian
Thanks, Fan. alluxio.master.whitelist option seems to be interesting. I will try this.

Regards,
Jais

On Thursday, July 12, 2018 at 12:09:45 AM UTC+5:30, Bin Fan wrote:
Hi Jais,

here is perhaps one hacky workaround by using `alluxio.master.whitelist` which indicates a set of Alluxio paths cachable.

search alluxio.master.whitelist in 
<a href="https://www.alluxio.org/docs/1.8/en/Configuration-Properties.html" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.alluxio.org%2Fdocs%2F1.8%2Fen%2FConfiguration-Properties.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGZWrLzaRh63J3dPCHy5Ywz7i_rDg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.alluxio.org%2Fdocs%2F1.8%2Fen%2FConfiguration-Properties.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGZWrLzaRh63J3dPCHy5Ywz7i_rDg&#39;;return true;">https://www.alluxio.org/docs/1.8/en/Configuration-Properties.html


On Wed, Jul 11, 2018 at 9:12 AM Jais Sebastian <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="1ogxo1l6AQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">jais...@...> wrote:
Hi,

Can I control the write options CACHE_THROUGH/THROUGH for Write scenario and  SKIP CACHE option while the read operation in Spark Dataset API based on the path? Let's say I have /cachedpath and /nocachepath , whenever I write into  /cachedpath  - file should be persisted through Alluxio cache. Other cases when I use /nocachepath, the file will not get cached. This I want to control in the same Spark context. Let me know if this approach is possible 

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="1ogxo1l6AQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio Cache control while read and write

Jais Sebastian
Hi Fan,
I tested the whitelist option. That does not seem to be working. 

Here is what I did 
1. Added /cachedpath  to the whitelist and verified configuration in Alluxio UI)
2. Copied a file from my Alluxio client ( command line ) to /cachedpath 
3. Copied another file to /noncachedpath 
4. Verified in In-Alluxio Data and both files seem to be present in Alluxio memory
5. Freed all the files from memory using free command 
6. Used tail command to read these files 
7. In Alluxio memory - both the files are loaded 

My expectation is step 3 should not load the file into cache and in Step 7 only   /cachedpath should load the file into memory. ie Use Alluxio for file system unification and caching should be enabled only for certain paths.

Regards,
Jais


On Thursday, July 12, 2018 at 10:32:01 AM UTC+5:30, Jais Sebastian wrote:
Thanks, Fan. alluxio.master.whitelist option seems to be interesting. I will try this.

Regards,
Jais

On Thursday, July 12, 2018 at 12:09:45 AM UTC+5:30, Bin Fan wrote:
Hi Jais,

here is perhaps one hacky workaround by using `alluxio.master.whitelist` which indicates a set of Alluxio paths cachable.

search alluxio.master.whitelist in 
<a href="https://www.alluxio.org/docs/1.8/en/Configuration-Properties.html" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.alluxio.org%2Fdocs%2F1.8%2Fen%2FConfiguration-Properties.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGZWrLzaRh63J3dPCHy5Ywz7i_rDg&#39;;return true;" onclick="this.href=&#39;https://www.google.com/url?q\x3dhttps%3A%2F%2Fwww.alluxio.org%2Fdocs%2F1.8%2Fen%2FConfiguration-Properties.html\x26sa\x3dD\x26sntz\x3d1\x26usg\x3dAFQjCNGZWrLzaRh63J3dPCHy5Ywz7i_rDg&#39;;return true;">https://www.alluxio.org/docs/1.8/en/Configuration-Properties.html


On Wed, Jul 11, 2018 at 9:12 AM Jais Sebastian <[hidden email]> wrote:
Hi,

Can I control the write options CACHE_THROUGH/THROUGH for Write scenario and  SKIP CACHE option while the read operation in Spark Dataset API based on the path? Let's say I have /cachedpath and /nocachepath , whenever I write into  /cachedpath  - file should be persisted through Alluxio cache. Other cases when I use /nocachepath, the file will not get cached. This I want to control in the same Spark context. Let me know if this approach is possible 

Regards,
Jais

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.