I have this set of data:
dump data;
This is a sample output: (this dataset is almost a million rows long).
("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")
I want to replace the values: Sat, Fri, Mon to numbers of week, I know how to use REPLACE for change just 1 value at a time, but I have to repeat it multiple times in order to change all days of the week:
data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');
Is there any way to do this in only one statement?
CodePudding user response:
You should use a CASE WHEN THEN statement
data_day_of_week = FOREACH data GENERATE
CASE
WHEN $4 == 'Mon' THEN '1'
WHEN $4 == 'Tue' THEN '2'
...
WHEN $4 == 'Sun' THEN '7'
END AS day_number;
You should also name your relations so not to use $1, $2 etc. If you name $4 as day_number then when you declare the variable as day_number from the CASE statement it'll "overwrite" your prior data.
CodePudding user response:
You can use JOIN with a reference to a mapping like this:
(Mon,1)
(Tue,2)
(Wed,3)
(Thu,4)
(Fri,5)
(Sat,6)
(Sun,7)
Join statement:
outer_left = join your_data by $4 left outer, day_mapping by day;