Re: Something confused about the alluxio document

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: Something confused about the alluxio document

Haoyuan Li
Jie,

No. It uses Alluxio URI to interact with data stored in HDFS.

Hope this helps,

Haoyuan



On Tue, May 24, 2016 at 7:28 AM, jie xu <[hidden email]> wrote:
Hello,
http://www.alluxio.org/documentation/en/Running-Spark-on-Alluxio.html
I doubt whether there is a mistake in the code displayed in "Use Data from HDFS"

> val s = sc.textFile("alluxio://localhost:19998/LICENSE")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://localhost:19998/LICENSE2")
shall be 
val s = sc.textFile("hdfs://localhost:9000/LICENSE")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://localhost:19998/LICENSE2")

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Something confused about the alluxio document

Haoyuan Li
Hi Jie,

Thanks for the question. It is recommended to ask questions in Alluxio User Mailing list so that more people can help you in a timely manner.

To your question, the answer is that the memory cost will NOT be doubled if program is coded in a correct way. For example, it is recommended to use Alluxio as input source and output sink of the Spark program. In this case, when Spark runs, it will process data one record by one record manner.

Hope this helps.

Best,

Haoyuan


On Wed, May 25, 2016 at 7:01 PM, jie xu <[hidden email]> wrote:
Haoyuan,
Thanks. But I get another question.  Alluxio save data in RamFS, then spark read data from Alluxio, and data in Spark is stored in JVM. So the memory will cost double? 

On Tue, May 24, 2016 at 10:31 PM, Haoyuan Li <[hidden email]> wrote:
Jie,

No. It uses Alluxio URI to interact with data stored in HDFS.

Hope this helps,

Haoyuan



On Tue, May 24, 2016 at 7:28 AM, jie xu <[hidden email]> wrote:
Hello,
http://www.alluxio.org/documentation/en/Running-Spark-on-Alluxio.html
I doubt whether there is a mistake in the code displayed in "Use Data from HDFS"

> val s = sc.textFile("alluxio://localhost:19998/LICENSE")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://localhost:19998/LICENSE2")
shall be 
val s = sc.textFile("hdfs://localhost:9000/LICENSE")
> val double = s.map(line => line + line)
> double.saveAsTextFile("alluxio://localhost:19998/LICENSE2")



--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.