Reading incomplete file

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Reading incomplete file

Hector Zhang
I found that the file content can not be read while the file is not closed. While on hdfs, the content of incomplete file can be read. Is it possible to fix this problem?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Reading incomplete file

Bin Fan
hi Hector,

this is actually not a bug but by design (at in the early dates, this assumption simplifies the design and implementation quite a bit).
you may want to state more details on the reasons to add the support of this semantics---we can definitely discuss and review the possible benefits.
Just that my estimation is this is not a simple fix, but requires non-trivial amount of change in the worker-master communication, and how 
worker handles the I/O (and the session cleanup).

- Bin

On Fri, Sep 28, 2018 at 4:50 AM Hector Zhang <[hidden email]> wrote:
I found that the file content can not be read while the file is not closed. While on hdfs, the content of incomplete file can be read. Is it possible to fix this problem?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Reading incomplete file

Hector Zhang
Thank you for your reply.

Firstly, how the requirement comes from. We are using Alluxio as an interface layer of distributed filesystem, and spark event log path is configured to Alluxio FS. And now, we want to parse the event log of running spark job, but the event log file of running job is not completed and can not be read.

Secondly, for general design, the requirement of an distributed filesystem interface layer is very common for Bigdata platform products, and Alluxio is a good choice. While for this requirement, cover all HDFS features are absolutely necessary.



On Saturday, September 29, 2018 at 6:04:49 AM UTC+8, Bin Fan wrote:
hi Hector,

this is actually not a bug but by design (at in the early dates, this assumption simplifies the design and implementation quite a bit).
you may want to state more details on the reasons to add the support of this semantics---we can definitely discuss and review the possible benefits.
Just that my estimation is this is not a simple fix, but requires non-trivial amount of change in the worker-master communication, and how 
worker handles the I/O (and the session cleanup).

- Bin

On Fri, Sep 28, 2018 at 4:50 AM Hector Zhang <<a href="javascript:" target="_blank" gdf-obfuscated-mailto="svb40NAfAQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">copperyb...@...> wrote:
I found that the file content can not be read while the file is not closed. While on hdfs, the content of incomplete file can be read. Is it possible to fix this problem?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to <a href="javascript:" target="_blank" gdf-obfuscated-mailto="svb40NAfAQAJ" rel="nofollow" onmousedown="this.href=&#39;javascript:&#39;;return true;" onclick="this.href=&#39;javascript:&#39;;return true;">alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" target="_blank" rel="nofollow" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Reading incomplete file

Bin Fan
Thanks for the elaboration.
I think to support reading incomplete file can be a good feature for Alluxio 2.x
can you create a JIRA at https://alluxio.atlassian.net?

thanks


On Saturday, September 29, 2018 at 1:33:17 AM UTC-7, Hector Zhang wrote:
Thank you for your reply.

Firstly, how the requirement comes from. We are using Alluxio as an interface layer of distributed filesystem, and spark event log path is configured to Alluxio FS. And now, we want to parse the event log of running spark job, but the event log file of running job is not completed and can not be read.

Secondly, for general design, the requirement of an distributed filesystem interface layer is very common for Bigdata platform products, and Alluxio is a good choice. While for this requirement, cover all HDFS features are absolutely necessary.



On Saturday, September 29, 2018 at 6:04:49 AM UTC+8, Bin Fan wrote:
hi Hector,

this is actually not a bug but by design (at in the early dates, this assumption simplifies the design and implementation quite a bit).
you may want to state more details on the reasons to add the support of this semantics---we can definitely discuss and review the possible benefits.
Just that my estimation is this is not a simple fix, but requires non-trivial amount of change in the worker-master communication, and how 
worker handles the I/O (and the session cleanup).

- Bin

On Fri, Sep 28, 2018 at 4:50 AM Hector Zhang <[hidden email]> wrote:
I found that the file content can not be read while the file is not closed. While on hdfs, the content of incomplete file can be read. Is it possible to fix this problem?

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to alluxio-user...@googlegroups.com.
For more options, visit <a href="https://groups.google.com/d/optout" rel="nofollow" target="_blank" onmousedown="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;" onclick="this.href=&#39;https://groups.google.com/d/optout&#39;;return true;">https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.