Home > database >  Create Multiple Columns from a Single Column - Scala spark
Create Multiple Columns from a Single Column - Scala spark

Time:09-28

I am working on a data frame that has columns like this

| col_a    | col_b            |
| -------- | -----------------|
| First    | name=x, address=Y|
| Second   | name=x, address=Y|

I want to convert it in a data frame with col_b converted into two columns like this

| col_a    | name | address |
| -------- | -----|---------|
| First    | x    | Y       |
| Second   | x    | Y       |

is it possible to do?

Thanks

CodePudding user response:

You can use split since your column is interest is StringType and extract the relevant indexes using getItem -

Data Preparation

input_str = """
First name=x,address=Y
Second name=x,address=Y
""".split()


input_values = list(map(lambda x: x.strip() if x.strip() != 'null' else None, input_str))

cols = list(map(lambda x: x.strip() if x.strip() != 'null' else None, "col_a     col_b".split()))
            
n = len(input_values)
n_col = 2

input_list = [tuple(input_values[i:i n_col]) for i in range(0,n,n_col)]

sparkDF = sql.createDataFrame(input_list, cols)

sparkDF.show()

 ------ ---------------- 
| col_a|           col_b|
 ------ ---------------- 
| First|name=x,address=Y|
|Second|name=x,address=Y|
 ------ ---------------- 

Split

first_split_col = F.split(sparkDF['col_b'], ',')

sparkDF = sparkDF.withColumn('name', F.split(first_split_col.getItem(0),'=').getItem(1))
sparkDF = sparkDF.withColumn('address', F.split(first_split_col.getItem(1),'=').getItem(1))

sparkDF.show()

 ------ ---------------- ---- ------- 
| col_a|           col_b|name|address|
 ------ ---------------- ---- ------- 
| First|name=x,address=Y|   x|      Y|
|Second|name=x,address=Y|   x|      Y|
 ------ ---------------- ---- ------- 
  • Related