Alluxio with presto - file system api or s3 storage apis

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Alluxio with presto - file system api or s3 storage apis

Arup Malakar
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Bill Graham
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Yupeng Fu
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Arup Malakar
Yupeng,

## HDFS api
Our preference is using HDFS api, as I am perceiving it to be more efficient than S3 api. But our tables have the scheme s3:// and looking at presto code looks like it doesn't let us
configure the file system implementation for s3. It supports only s3 and emrfs as types. And it overrides it with PrestoFileSystem or EmrfsFileSystem irrespective of what we provide in core-site.xml

        config.set("fs.s3.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", PrestoS3FileSystem.class.getName());

So looks like we may have to patch presto for it to work via HDFS api.

## S3 api
Regarding using S3 api I am getting: 

Query 20180727_183745_00003_qvu3u failed: Unsupported Media Type (Service: Amazon S3; Status Code: 415; Error Code: 415 Unsupported Media Type; Request ID: null; S3 Extended Request ID: null)

I suspect I may have configured alluxio incorrectly. Here is what I have:

In presto:
hive.s3.path-style-access=true

In alluxio:
alluxio.underfs.address=s3a://<our-bucket>/

We are running presto version 0.198, so version incompatibility could be another reason.





On Fri, Jul 27, 2018 at 1:19 PM Yupeng Fu <[hidden email]> wrote:
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Arup Malakar

With little change I was able to add alluxio as an alternative implementation to s3 in presto. But I noticed that alluxio requires that all urls 
has the scheme as alluxio://... But given our files reside in s3 the uris received by alluxio from prseto looks like: s3://<bucketname>/example/0.parquet . The file system 
class errors out saying file scheme doesn't match. I am able to ls the file directly using alluxio api: alluxio fs ls /example/0.parquet So I am thinking it should be possible
for alluxio fs implementation to be able to handle the s3 uri. 

Would appreciate if anyone can point out if there is any configuration or clever ways to let alluxio accept these uris. I will poke around the alluxio fs classes a bit to understand more.





On Fri, Jul 27, 2018 at 1:45 PM Arup Malakar <[hidden email]> wrote:
Yupeng,

## HDFS api
Our preference is using HDFS api, as I am perceiving it to be more efficient than S3 api. But our tables have the scheme s3:// and looking at presto code looks like it doesn't let us
configure the file system implementation for s3. It supports only s3 and emrfs as types. And it overrides it with PrestoFileSystem or EmrfsFileSystem irrespective of what we provide in core-site.xml

        config.set("fs.s3.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", PrestoS3FileSystem.class.getName());

So looks like we may have to patch presto for it to work via HDFS api.

## S3 api
Regarding using S3 api I am getting: 

Query 20180727_183745_00003_qvu3u failed: Unsupported Media Type (Service: Amazon S3; Status Code: 415; Error Code: 415 Unsupported Media Type; Request ID: null; S3 Extended Request ID: null)

I suspect I may have configured alluxio incorrectly. Here is what I have:

In presto:
hive.s3.path-style-access=true

In alluxio:
alluxio.underfs.address=s3a://<our-bucket>/

We are running presto version 0.198, so version incompatibility could be another reason.





On Fri, Jul 27, 2018 at 1:19 PM Yupeng Fu <[hidden email]> wrote:
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Arup Malakar


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Andrew Audibert
Hi Arup,

The S3 API has some limitations, I'm not sure whether it supports all APIs used by Presto. I think it will be easier to get Presto working through the HDFS API. Can you change the schemes in Presto to use alluxio:// instead of s3://?

- Andrew

On Fri, Jul 27, 2018 at 5:01 PM Arup Malakar <[hidden email]> wrote:

With little change I was able to add alluxio as an alternative implementation to s3 in presto. But I noticed that alluxio requires that all urls 
has the scheme as alluxio://... But given our files reside in s3 the uris received by alluxio from prseto looks like: s3://<bucketname>/example/0.parquet . The file system 
class errors out saying file scheme doesn't match. I am able to ls the file directly using alluxio api: alluxio fs ls /example/0.parquet So I am thinking it should be possible
for alluxio fs implementation to be able to handle the s3 uri. 

Would appreciate if anyone can point out if there is any configuration or clever ways to let alluxio accept these uris. I will poke around the alluxio fs classes a bit to understand more.





On Fri, Jul 27, 2018 at 1:45 PM Arup Malakar <[hidden email]> wrote:
Yupeng,

## HDFS api
Our preference is using HDFS api, as I am perceiving it to be more efficient than S3 api. But our tables have the scheme s3:// and looking at presto code looks like it doesn't let us
configure the file system implementation for s3. It supports only s3 and emrfs as types. And it overrides it with PrestoFileSystem or EmrfsFileSystem irrespective of what we provide in core-site.xml

        config.set("fs.s3.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", PrestoS3FileSystem.class.getName());

So looks like we may have to patch presto for it to work via HDFS api.

## S3 api
Regarding using S3 api I am getting: 

Query 20180727_183745_00003_qvu3u failed: Unsupported Media Type (Service: Amazon S3; Status Code: 415; Error Code: 415 Unsupported Media Type; Request ID: null; S3 Extended Request ID: null)

I suspect I may have configured alluxio incorrectly. Here is what I have:

In presto:
hive.s3.path-style-access=true

In alluxio:
alluxio.underfs.address=s3a://<our-bucket>/

We are running presto version 0.198, so version incompatibility could be another reason.





On Fri, Jul 27, 2018 at 1:19 PM Yupeng Fu <[hidden email]> wrote:
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Arup Malakar


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Arup Malakar

Andrew, thanks for the suggestion and using the hdfs api makes sense. Also I think it would be nice if alluxio had a file system implementation for any scheme, which could read/write any file schemes with alluxio caching in between. That would make it really easy to try/poc alluxio as a drop in replacement in existing systems without having to change code in those systems or re-write uris. Do you think a caching file system implementation would make sense in alluxio?


On Fri, Jul 27, 2018 at 5:13 PM Andrew Audibert <[hidden email]> wrote:
Hi Arup,

The S3 API has some limitations, I'm not sure whether it supports all APIs used by Presto. I think it will be easier to get Presto working through the HDFS API. Can you change the schemes in Presto to use alluxio:// instead of s3://?

- Andrew

On Fri, Jul 27, 2018 at 5:01 PM Arup Malakar <[hidden email]> wrote:

With little change I was able to add alluxio as an alternative implementation to s3 in presto. But I noticed that alluxio requires that all urls 
has the scheme as alluxio://... But given our files reside in s3 the uris received by alluxio from prseto looks like: s3://<bucketname>/example/0.parquet . The file system 
class errors out saying file scheme doesn't match. I am able to ls the file directly using alluxio api: alluxio fs ls /example/0.parquet So I am thinking it should be possible
for alluxio fs implementation to be able to handle the s3 uri. 

Would appreciate if anyone can point out if there is any configuration or clever ways to let alluxio accept these uris. I will poke around the alluxio fs classes a bit to understand more.





On Fri, Jul 27, 2018 at 1:45 PM Arup Malakar <[hidden email]> wrote:
Yupeng,

## HDFS api
Our preference is using HDFS api, as I am perceiving it to be more efficient than S3 api. But our tables have the scheme s3:// and looking at presto code looks like it doesn't let us
configure the file system implementation for s3. It supports only s3 and emrfs as types. And it overrides it with PrestoFileSystem or EmrfsFileSystem irrespective of what we provide in core-site.xml

        config.set("fs.s3.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", PrestoS3FileSystem.class.getName());

So looks like we may have to patch presto for it to work via HDFS api.

## S3 api
Regarding using S3 api I am getting: 

Query 20180727_183745_00003_qvu3u failed: Unsupported Media Type (Service: Amazon S3; Status Code: 415; Error Code: 415 Unsupported Media Type; Request ID: null; S3 Extended Request ID: null)

I suspect I may have configured alluxio incorrectly. Here is what I have:

In presto:
hive.s3.path-style-access=true

In alluxio:
alluxio.underfs.address=s3a://<our-bucket>/

We are running presto version 0.198, so version incompatibility could be another reason.





On Fri, Jul 27, 2018 at 1:19 PM Yupeng Fu <[hidden email]> wrote:
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Arup Malakar


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
Reply | Threaded
Open this post in threaded view
|

Re: Alluxio with presto - file system api or s3 storage apis

Bin Fan
Currently, Alluxio's HDFS interface is better supported, in terms of being tested by different users, compute applications.
Like Andrew, I would suggest to use HDFS interface to start POCs.

On the other hand, it is a good observation to make Alluxio more plug and play.
I believe we will make Alluxio S3 interface more complete in future releases. 

- Bin



On Mon, Jul 30, 2018 at 11:34 AM Arup Malakar <[hidden email]> wrote:

Andrew, thanks for the suggestion and using the hdfs api makes sense. Also I think it would be nice if alluxio had a file system implementation for any scheme, which could read/write any file schemes with alluxio caching in between. That would make it really easy to try/poc alluxio as a drop in replacement in existing systems without having to change code in those systems or re-write uris. Do you think a caching file system implementation would make sense in alluxio?


On Fri, Jul 27, 2018 at 5:13 PM Andrew Audibert <[hidden email]> wrote:
Hi Arup,

The S3 API has some limitations, I'm not sure whether it supports all APIs used by Presto. I think it will be easier to get Presto working through the HDFS API. Can you change the schemes in Presto to use alluxio:// instead of s3://?

- Andrew

On Fri, Jul 27, 2018 at 5:01 PM Arup Malakar <[hidden email]> wrote:

With little change I was able to add alluxio as an alternative implementation to s3 in presto. But I noticed that alluxio requires that all urls 
has the scheme as alluxio://... But given our files reside in s3 the uris received by alluxio from prseto looks like: s3://<bucketname>/example/0.parquet . The file system 
class errors out saying file scheme doesn't match. I am able to ls the file directly using alluxio api: alluxio fs ls /example/0.parquet So I am thinking it should be possible
for alluxio fs implementation to be able to handle the s3 uri. 

Would appreciate if anyone can point out if there is any configuration or clever ways to let alluxio accept these uris. I will poke around the alluxio fs classes a bit to understand more.





On Fri, Jul 27, 2018 at 1:45 PM Arup Malakar <[hidden email]> wrote:
Yupeng,

## HDFS api
Our preference is using HDFS api, as I am perceiving it to be more efficient than S3 api. But our tables have the scheme s3:// and looking at presto code looks like it doesn't let us
configure the file system implementation for s3. It supports only s3 and emrfs as types. And it overrides it with PrestoFileSystem or EmrfsFileSystem irrespective of what we provide in core-site.xml

        config.set("fs.s3.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3a.impl", PrestoS3FileSystem.class.getName());
        config.set("fs.s3n.impl", PrestoS3FileSystem.class.getName());

So looks like we may have to patch presto for it to work via HDFS api.

## S3 api
Regarding using S3 api I am getting: 

Query 20180727_183745_00003_qvu3u failed: Unsupported Media Type (Service: Amazon S3; Status Code: 415; Error Code: 415 Unsupported Media Type; Request ID: null; S3 Extended Request ID: null)

I suspect I may have configured alluxio incorrectly. Here is what I have:

In presto:
hive.s3.path-style-access=true

In alluxio:
alluxio.underfs.address=s3a://<our-bucket>/

We are running presto version 0.198, so version incompatibility could be another reason.





On Fri, Jul 27, 2018 at 1:19 PM Yupeng Fu <[hidden email]> wrote:
Hi Bill,

Do you want to connect Presto with Alluxio via HDFS API or S3 API?

If the former, then you do not need to set hive.s3.endpoint. Alluxio abstracts the under filesystem, so all you need to do is to connect Presto and Alluxio via HDFS API as described in this doc. And you can configure Alluxio with S3 as instructed in this doc.

If you want to use S3 API to communicate with Alluxio, then yes, you need to set hive.s3.endpoint. Alluxio supports S3 client.

Hope this helps.

Best,


On Fri, Jul 27, 2018 at 10:39 AM, Bill Graham <[hidden email]> wrote:
Arup and I are working on this together and struggling to understand how to configure Presto with S3 using alluxio. Any clarifications would be great. We have an alluxio cluster running that is colocated with our presto nodes and that seems to work ok. 

How to get presto to communicate with it is still unclear from the docs. Do we need to run the S3 client processes as proxies on all the workers and then point Presto to it via hive.s3.endpoint=http://localhost:39999/ in catalog/hive.properties of presto?

Or could the hadoop file system api be used when accessing S3 locations from presto, even though we're not using HDFS?

On Thu, Jul 26, 2018 at 10:54 PM Arup Malakar <[hidden email]> wrote:
Hi,

I am working on prototyping using alluxio as a caching layer for presto in front of s3. I have setup alluxio and have pointed it to our s3 bucket. Now coming to presto configuration it is not clear to me:
- if presto talks to alluxio using hadoop file system api
- or if alluxio implements s3 storage apis and I need to set hive.s3.endpoint in presto to point to alluxio (hive.s3.endpoint=http://localhost:19999/) in catalog/hive.properties of presto.

--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.



--
Arup Malakar


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.
--


--
Arup Malakar

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.

--
You received this message because you are subscribed to the Google Groups "Alluxio Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to [hidden email].
For more options, visit https://groups.google.com/d/optout.