I am working on 2 impala tables like these.
table1:
full_alert |
---|
xyz_visitArizona_xyz |
klmthak_visitLondon |
visit_Istanbul |
tzylpa_visit_Paris |
table2:
only_name | segment |
---|---|
Arizona | 1 |
London | 2 |
Istanbul | 2 |
Paris | 2 |
What I want to do is to check only_name column in table2, and if the name appears on the full_alert column in table1, then create new column as destination. Table1 contains different structures for full_alert that's why I couldn't use SUBSTR() (I tried and saw many exceptions)
Is it possible to do this in Hive or Impala?
I tried to do this parsing in Jupyter notebook but data is too big and I got either null table or system error whenever I tried to run the query.
desired_table:
full_alert | destination |
---|---|
xyz_visitArizona_xyz | Arizona |
klmthak_visitLondon | London |
visit_Istanbul | Istanbul |
tzylpa_visit_Paris | Paris |
Thank you very much for your support.
CodePudding user response:
SELECT table1.full_alert, table2.only_name destination
FROM table1
JOIN table2 ON LOCATE(table2.only_name, table1.full_alert)
I tried both these queries but I got an syntax error. – Merve
Apache Impala implements LOCATE() function.
It is possible that it simply does not accept numeric output as logical value. Try this:
SELECT table1.full_alert, table2.only_name destination
FROM table1
JOIN table2 ON LOCATE(table2.only_name, table1.full_alert) > 0