Home > Enterprise >  Getting error in Spark SQL when trying to concat '}' character
Getting error in Spark SQL when trying to concat '}' character

Time:10-20

I have a use case where I need to concat a '}' to a string using Spark SQL. The sample dataset is as below:

 -------------------------------------- ----- 
|col_1                                 |col_2|
 -------------------------------------- ----- 
|{"key_1" : "val_1","key_2" : "val_2"}|abcd |
 -------------------------------------- ----- 

root
 |-- col_1: string (nullable = true)
 |-- col_2: string (nullable = true)

I want to check the length of col_1 and based on that add value of col_2 into the JSON-formatted string of col_1. I have written a Spark SQL query as below:

select *, case when length(col_1) = 2 then
concat(substring(col_1, 0, length(col_1) - 1), '"col_2":"',cast(col_2 as STRING), '"}')
else concat(substring(col_1, 0, length(col_1) - 1), ',"col_2":"', cast(col_2 as STRING), '"}')
end as mod_col_1
from df

The query parsing fails when encountering the '}' character. Is there any way to add/escape this character in the query. Or any way to generate the desired string. Expected output: when col_1 = "{}"

 -------------------------------------- ----- 
    |col_1                                 |col_2|
     -------------------------------------- ----- 
    |{}|abcd |
     -------------------------------------- ----- 

output:

 -------------------------------------- ----------------- --------------------------------------------------------------- 
|col_1|col_2|mod_col_1                                 |
 -------------------------------------- ----------------- --------------------------------------------------------------- 
|{}|abcd             |{'col_2' : 'abcd'}|
 -------------------------------------- ----------------- --------------------------------------------------------------- 

when, col_1 = {"key_1" : "val_1", "key_2" : "val_2"}

 -------------------------------------- ----- 
        |col_1                                 |col_2|
         -------------------------------------- ----- 
        |{"key_1" : "val_1","key_2" : "val_2"}|abcd |
         -------------------------------------- ----- 

output:

 -------------------------------------- ----------------- --------------------------------------------------------------- 
|col_1|col_2|mod_col_1                                 |
 -------------------------------------- ----------------- --------------------------------------------------------------- 
|{"key_1" : "val_1","key_2" : "val_2"}|abcd             |{"key_1" : "val_1"key_2" : "val_2","col_2":"abcd"}|
 -------------------------------------- ----------------- --------------------------------------------------------------- 

Happy to share more details if required.

CodePudding user response:

You can try the regexp_replace() function. Check this

spark.sql(s"""
with t1 ( select '{"key_1" : "val_1","key_2" : "val_2"}' col_1, 'abcd' col_2 
         union all
         select '{}', 'defg' )
        select *, case
          when col_1 = '{}' then "{ 'col_2' : '"||col_2|| "'}"  
          else regexp_replace(col_1,"[}]",":")||"'col_2' : '"|| col_2 || "'}"
           end x from t1
""").show(50,false) 

Output:

 ------------------------------------- ----- ------------------------------------------------------ 
|col_1                                |col_2|x                                                     |
 ------------------------------------- ----- ------------------------------------------------------ 
|{"key_1" : "val_1","key_2" : "val_2"}|abcd |{"key_1" : "val_1","key_2" : "val_2":'col_2' : 'abcd'}|
|{}                                   |defg |{ 'col_2' : 'defg'}                                   |
 ------------------------------------- ----- ------------------------------------------------------ 

Update: To get the output in double quotes, wrap it in single quotes

spark.sql(s"""
with t1 ( select '{"key_1" : "val_1","key_2" : "val_2"}' col_1, 'abcd' col_2 
         union all
         select '{}', 'defg' )
        select *, case
          when col_1 = '{}' then '{ "col_2" : "' ||col_2|| '"}'  
          else regexp_replace(col_1,"[}]",":")||'"col_2" : "'|| col_2 || '"}'
           end x from t1
""").show(50,false) 

 ------------------------------------- ----- ------------------------------------------------------ 
|col_1                                |col_2|x                                                     |
 ------------------------------------- ----- ------------------------------------------------------ 
|{"key_1" : "val_1","key_2" : "val_2"}|abcd |{"key_1" : "val_1","key_2" : "val_2":"col_2" : "abcd"}|
|{}                                   |defg |{ "col_2" : "defg"}                                   |
 ------------------------------------- ----- ------------------------------------------------------ 
  • Related