I have a greenplum database up and running and parquet files stored in hdfs at /user/hadoopuser/raw/
I installed and launched pxf and created external table with:
create external table requests(id bigint, full_name text, req_date timestamp)
location('pxf://user/hadoopuser/raw?PROFILE=hdfs:parquet') format 'CUSTOM' (formatter='pxfwritable_import')
But when I try to access data with select * from requests
I get the following error:
[08000] ERROR: PXF server error : invalid configuration for server 'default' (seg0 slice1 10.0.2.20:6000 pid=18636) Hint: Configure a valid value for 'pxf.fs.basePath' property for server 'default' to access the filesystem.
pxf-service.log only contains
java.io.IOException: org.greenplum.pxf.api.error.PxfRuntimeException: invalid configuration for server 'default'
What is the valid value for pxf.fs.basePath
, where do I set it and why is this error happening?
CodePudding user response:
PXF stores configuration for external data sources (e.g., "servers") in either $PXF_HOME/servers/
(the default) or $PXF_BASE/servers
. Unless you have relocated $PXF_BASE
(see Relocating $PXF_BASE
in the docs), it will be stored in $PXF_HOME
which is /usr/local/pxf-gp<GPDB-major-version>
.
In the $PXF_HOME/servers
directory, there should be one directory per external data source and there typically is called default/
. For access HDFS, this directory should contain:
- a copy of hdfs-site.xml
- a copy of core-site.xml
- a copy of pxf-site.xml (see
$PXF_HOME/templates
)