I have a requirement in hive werin based on input column "lang".I am here looking for languages english(eng) and french(fra),this column is split by delimiter ';'.It should pick from col1. and place it in col_eng or col_fra based on the language.
For ex: lang is bel;fra;eng ,it should pick respective french and english values from col1 delimited from ';' ABC/WSC;ASA/SAS;ASA/SD ie ASA/SAS in col_fra and ASA/SD to col_eng.
I have listed some scenarios below. I tried with split regular expression and putting it in case statements but I cant capture all these scenarios.
for ex:
case when lang like ('eng;fra%') then split(mnemonic,';')[0] as mnemonic_eng,split(mnemonic,';')[1] as mnemonic_fr
Is there any better way to resolve this.Below are all the scenarios and desired output.Could you please help.
lang col1 col_eng col_fra
1 fra;eng ABC/DEF;EFH/ASD EFH/ASD ABC/DEF
2 eng;fra ABC/DEF;EFH/ASD ABC/DEF EFH/ASD
3 bel;fra;eng ABC/WSC;ASA/SAS;ASA/SD ASA/SD ASA/SAS
4 eng;eng;fra WSX/ASD;WSX/ASD;SEF/DFF WSX/ASD SEF/DFF
5 eng;fra;rus QAZ/WER;AST/RS;ASD/DFG QAZ/WER AST/RS
6 rum;eng AQW/WER;WER/DF WER/DF null
7 rum;fra;eng ABC/WSC;ASA/SAS;ASA/SD ASA/SD ASA/SAS
8 spa;fra ASD/AWE;WER/ERT null WER/ERT
9 eng asd/ert asd/ert null
10 fra AWE/ERR AWE/ERR
11 fra;eng;rus ;data_eng;dara_rus data_eng null
12 fra;eng data_fra; null data_fra
CodePudding user response:
You can calculate position of 'eng' and 'fra' using find_in_set (starts with 1), then apply it to the array produced by split:
select lang, col1,
split(col1,'\073')[find_in_set('eng',regexp_replace(lang,'\073',','))-1] as col_eng,
split(col1,'\073')[find_in_set('fra',regexp_replace(lang,'\073',','))-1] as col_fra