can anyone explain the situation?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

can anyone explain the situation?

wgcris



my cluster environment
a master ,6 worker, worker_mem=5GB; readType=NO_CACHE writeType=MUST_CACHE, 
i load 8.5GB data from local to alluxio ; i check the webUI; the distribution of block is shown:

and then i run wordcount ,it successed; i check the distribution of block again , it is shown in fig2

 

  i have already set the readType to NO_CACHE, i think the alluxio can only have one replicaton , why come out this result? who can explain the result? thx


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: can anyone explain the situation?

Bin Fan
Alluxio implements single replication on writing data to Alluxio;
whereas on the read path, it is possible that clients on multiple servers requesting the same data block and all cache this block on their local Alluxio worker. That's why you may see multiple copies.

In terms of NO_CACHE, how do you set it? it looks to me still cache.

On Thu, Aug 18, 2016 at 2:10 AM, wgcris <[hidden email]> wrote:



my cluster environment
a master ,6 worker, worker_mem=5GB; readType=NO_CACHE writeType=MUST_CACHE, 
i load 8.5GB data from local to alluxio ; i check the webUI; the distribution of block is shown:

and then i run wordcount ,it successed; i check the distribution of block again , it is shown in fig2

 

  i have already set the readType to NO_CACHE, i think the alluxio can only have one replicaton , why come out this result? who can explain the result? thx


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: 与“can anyone explain the situation?”相关的私人邮件

Bin Fan
(cc'ing back Alluxio-users in case other people can benefit from the discussion)

from what I learnt here, you set `NO_CACHE` in `alluxio-site.xml` (do you really mean `alluxio-site.properties` anyway?) and the webUI shows so.

Here is what's happening I guess:

You modifed the `alluxio-site.properties` file in ${ALLUXIO_HOME}/conf, which will be recoginized by alluxio master/worker and alluxio shell commands. For your MapReduce jobs which uses Alluxio as a client, they have no idea about this this property file. You need to pass the Alluxio read type parameter to those applications and let them pass down this parameter to Alluxio client.

Please read 
and

Hope this can help

- Bin
 

On Thu, Aug 18, 2016 at 7:50 PM, wgcris <[hidden email]> wrote:
i make sure i set the readType to NO_CACHE, i check it from webUI. i dont know why alluxio create so many replication for the same block? i have a question , when i run wordcount and the alluxio is the data source , how does the jobTracker realize the data-locality,and not create many block replication? because whatever i set the readType to No_CACHE or CACHE_PROMOTE,the alluxio always create many replicate. 

在 2016年8月19日星期五 UTC+8上午1:02:58,Bin Fan写道:
Alluxio implements single replication on writing data to Alluxio;
whereas on the read path, it is possible that clients on multiple servers requesting the same data block and all cache this block on their local Alluxio worker. That's why you may see multiple copies.

In terms of NO_CACHE, how do you set it? it looks to me still cache.

On Thu, Aug 18, 2016 at 2:10 AM, wgcris <[hidden email]> wrote:



my cluster environment
a master ,6 worker, worker_mem=5GB; readType=NO_CACHE writeType=MUST_CACHE, 
i load 8.5GB data from local to alluxio ; i check the webUI; the distribution of block is shown:

and then i run wordcount ,it successed; i check the distribution of block again , it is shown in fig2

 

  i have already set the readType to NO_CACHE, i think the alluxio can only have one replicaton , why come out this result? who can explain the result? thx


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.