Home > Software engineering >  Use REPLACE in multicondition RegExp in Pig Latin
Use REPLACE in multicondition RegExp in Pig Latin

Time:04-17

I have this set of data:

dump data;

This is a sample output: (this dataset is almost a million rows long).

("0",60,0,1,"Fri")
("1",47,0,1,"Mon")
("1",23,1,0,"Tue")
("1",60,0,0,"Sat")
("1",50,1,1,"Fri")

I want to replace the values: Sat, Fri, Mon to numbers of week, I know how to use REPLACE for change just 1 value at a time, but I have to repeat it multiple times in order to change all days of the week:

data_day_of_week = FOREACH data GENERATE $0,$1,$2,$3,REPLACE($4, 'Mon', '1');

Is there any way to do this in only one statement?

CodePudding user response:

You should use a CASE WHEN THEN statement

data_day_of_week = FOREACH data GENERATE
    CASE
         WHEN $4 == 'Mon' THEN '1'
         WHEN $4 == 'Tue' THEN '2'
         ...
         WHEN $4 == 'Sun' THEN '7'
    END AS day_number;

You should also name your relations so not to use $1, $2 etc. If you name $4 as day_number then when you declare the variable as day_number from the CASE statement it'll "overwrite" your prior data.

CodePudding user response:

You can use JOIN with a reference to a mapping like this:

(Mon,1)
(Tue,2)
(Wed,3)
(Thu,4)
(Fri,5)
(Sat,6)
(Sun,7)

Join statement:

outer_left = join your_data by $4 left outer, day_mapping by day;
  • Related