Problem with Alluxio Local File System

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with Alluxio Local File System

Md Mahbub Alam
Hi,

Currently, I am running Alluxio-1.2.0 on 4-Node Cluster (1-Master, 3-Worker). When I copied any file into Alluxio, I saw that it creates two copies of a file. One is inside the /underFSStorage of Master(full file) and another copy is inside the /mnt/ramdisk/alluxioworker of workers. My question is If I read this file from Alluxio, it will read from which copy?

Also,  Today I read the following lines from the Alluxio latest version of installation guideline. As I am currently using the Alluxio local file system for my clusters, does that mean distributed operation not working on the local file system?

"You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead, you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on."

Thanks

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Alluxio Local File System

Bin Fan
Setting alluxio ufs to local fs (e.g., /underFSStorage) will only work on the simplest case: one master, one worker, collocated on the same node

Otherwise, master, workers will see different views on their "/underFSStorage"

Essentially, alluxio ufs needs to be assessable from all alluxio masters/workers and provides the same / consistent view to them

- Bin

On Wed, May 2, 2018 at 5:59 AM, Md Mahbub Alam <[hidden email]> wrote:
Hi,

Currently, I am running Alluxio-1.2.0 on 4-Node Cluster (1-Master, 3-Worker). When I copied any file into Alluxio, I saw that it creates two copies of a file. One is inside the /underFSStorage of Master(full file) and another copy is inside the /mnt/ramdisk/alluxioworker of workers. My question is If I read this file from Alluxio, it will read from which copy?

Also,  Today I read the following lines from the Alluxio latest version of installation guideline. As I am currently using the Alluxio local file system for my clusters, does that mean distributed operation not working on the local file system?

"You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead, you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on."

Thanks

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Alluxio Local File System

Md Mahbub Alam
Yes, I already mention that. But my question is kind of different. suppose I run the following command:

    $ alluxio fs copyFromLocal /input/in.csv /test/in.csv

this command creates a "in.csv" file under /underFSStorage of Master node and it also creates file(s) inside the /mnt/ramdisk/alluxioworker folder of worker node(s). So, my question is if I read "in.csv" from a program, which Alluxio file actually read???  

My second question based on the following comments. Why do I need HDFS or others? Alluxio distributed operation not working with the local filesystem?

"You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead, you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on."

thanks.
Mahbub

On Wednesday, May 2, 2018 at 9:07:02 PM UTC-3, Bin Fan wrote:
Setting alluxio ufs to local fs (e.g., /underFSStorage) will only work on the simplest case: one master, one worker, collocated on the same node

Otherwise, master, workers will see different views on their "/underFSStorage"

Essentially, alluxio ufs needs to be assessable from all alluxio masters/workers and provides the same / consistent view to them

- Bin

On Wed, May 2, 2018 at 5:59 AM, Md Mahbub Alam <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="_hguGDQ8AQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">emahb...@...> wrote:
Hi,

Currently, I am running Alluxio-1.2.0 on 4-Node Cluster (1-Master, 3-Worker). When I copied any file into Alluxio, I saw that it creates two copies of a file. One is inside the /underFSStorage of Master(full file) and another copy is inside the /mnt/ramdisk/alluxioworker of workers. My question is If I read this file from Alluxio, it will read from which copy?

Also,  Today I read the following lines from the Alluxio latest version of installation guideline. As I am currently using the Alluxio local file system for my clusters, does that mean distributed operation not working on the local file system?

"You cannot use local file system as Alluxio’s under storage system if there are multiple nodes in the cluster. Instead, you need to set up a shared storage to which all Alluxio servers have access. The shared storage can be network file system (NFS), HDFS, S3, and so on."

Thanks

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="_hguGDQ8AQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Problem with Alluxio Local File System

Gene Pang
Hi,

When a file is created in Alluxio, the file contents can (depending on configuration) be stored in 2 "logical" places: 1) The Alluxio storage layer, 2) the Under filesystem

The Alluxio storage layer is the tiered storage that all the Alluxio workers manage on their local machines. In your example, it is the /mnt/ramdisk/alluxioworker folder on the machine.

The Under FileSystem (UFS) is another storage system which is mounted to the Alluxio namespace. The UFS can be any number of systems, like HDFS, S3, etc. In your example, it happens to be the local filesystem on the machine, at the path /underFSStorage.

This is why you see file data at both locations (the worker managed storage, and the UFS). If you read the file, Alluxio prioritizes reading from the Alluxio storage first, so it will read from the worker managed storage (ramdisk).

For the UFS, it must be a storage system which is accessible from all masters and workers. If you only have a single machine deployment (all masters and workers are only on 1 machine), then the local filesystem UFS would work. However, most Alluxio deployments use multiple machines, so the local fs UFS cannot be used on those scenarios. Instead, an externally accessible storage system must be the UFS (like HDFS, S3, etc), so that all master machines and worker machines can access that UFS storage.

Hope this helps,
Gene

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.