I have a file called test.txt with the records as below (disregard the dots):
"1" "a" "x"
"2" "b" "y"
"3" "c" "z"
(tab as field separator)
My pig script (test.pig):
a=LOAD '/Analytics/warehouse/SF/test.txt' as (fullrecord:chararray);
b=FOREACH a generate REPLACE($0,'\t',',');
STORE b INTO 'hdfs://localhost:9000/Analytics/warehouse/SF/sf.out' USING PigStorage(',');
I run the script with: pig -x mapreduce test.pig
the output:
.../warehouse/SF/sf.out
part-m-0000
And the content is only:
"1"
"2"
"3"
Q1- What happened with the other fields?
Q2- Why the tab characters wasn't changed by ","?
Q3- How can i achieve the next result?
"1","a","x"
"2","b","y"
"3","c","z"
Q4- How can i query that result with HIVE?
CodePudding user response:
What happened with the other fields
LOAD defaults to using tab delimiter. Your GENERATE
only grabbed the first column. You need USING PigStorage('\n')
to load the whole line. But you can also not do this, and just remove the FOREACH
, then STORE with PigStorage(',')
Why the tab characters wasn't changed
Related to above, there is nothing to replace when you only have one element.
How can i query that result with HIVE?
Use HCatalog, not PigStorage -
STORE data_alias INTO 'tablename' USING org.apache.hcatalog.pig.HCatStorer();
Then query the table.
Or you would need to define a Hive External table over the HDFS data.
You can also skip using Pig altogether; Hive can query files that have tabs - STORED AS TEXT FIELDS DELIMITED BY '\t'