How to split the column with a delimiter based on input column in hive-CodePudding

I have a requirement in hive werin based on input column "lang".I am here looking for languages english(eng) and french(fra),this column is split by delimiter ';'.It should pick from col1. and place it in col_eng or col_fra based on the language.

For ex: lang is bel;fra;eng ,it should pick respective french and english values from col1 delimited from ';' ABC/WSC;ASA/SAS;ASA/SD ie ASA/SAS in col_fra and ASA/SD to col_eng.

I have listed some scenarios below. I tried with split regular expression and putting it in case statements but I cant capture all these scenarios.

for ex:

case when lang like ('eng;fra%') then split(mnemonic,';')[0] as mnemonic_eng,split(mnemonic,';')[1] as mnemonic_fr

Is there any better way to resolve this.Below are all the scenarios and desired output.Could you please help.

     lang                 col1                  col_eng     col_fra

1   fra;eng             ABC/DEF;EFH/ASD         EFH/ASD      ABC/DEF
2   eng;fra             ABC/DEF;EFH/ASD         ABC/DEF      EFH/ASD
3   bel;fra;eng         ABC/WSC;ASA/SAS;ASA/SD  ASA/SD       ASA/SAS
4   eng;eng;fra         WSX/ASD;WSX/ASD;SEF/DFF WSX/ASD      SEF/DFF
5   eng;fra;rus         QAZ/WER;AST/RS;ASD/DFG  QAZ/WER      AST/RS
6   rum;eng             AQW/WER;WER/DF          WER/DF       null
7   rum;fra;eng         ABC/WSC;ASA/SAS;ASA/SD  ASA/SD      ASA/SAS
8   spa;fra             ASD/AWE;WER/ERT          null       WER/ERT
9   eng                 asd/ert                 asd/ert      null
10  fra                 AWE/ERR                             AWE/ERR
11  fra;eng;rus         ;data_eng;dara_rus     data_eng      null
12  fra;eng             data_fra;                null      data_fra

CodePudding user response：

You can calculate position of 'eng' and 'fra' using find_in_set (starts with 1), then apply it to the array produced by split:

select lang, col1, 
       split(col1,'\073')[find_in_set('eng',regexp_replace(lang,'\073',','))-1] as col_eng,
       split(col1,'\073')[find_in_set('fra',regexp_replace(lang,'\073',','))-1] as col_fra