Home > Back-end >  add double quotes at the start and end of each string of column pyspark
add double quotes at the start and end of each string of column pyspark

Time:04-13

hello guyes im using pyspark 2.3

i have a dataframe with string column named "code_lei" i want to add double quotes at the start and end of each string in the column without deleting or changing the blanck space between the strings of the column

input dataframe : 
 ---- -------------- ------ ------------ ------- -------------------- 
|vide|       integer|double|xx_dt_arrete|vide_de|            code_lei|
 ---- -------------- ------ ------------ ------- -------------------- 
|null|10000000000000|   1.1|  2021-06-30|   null|  code_lei et chorba|
|null|10000000000000|   1.1|  2021-06-30|   null|                null|
|null|10000000000000|   1.1|  2021-06-30|   null| code_lei et chorba |
|null|10000000000000|   1.1|  2021-06-30|   null|     code_lei ee    |
|null|             2|   2.2|        null|   null|            code_lei|
|null|             2|   2.2|        null|   null|            code_lei|
|null|             2|   2.2|        null|   null|            code_lei|
|null|             2|   2.2|        null|   null|            code_lei|
 ---- -------------- ------ ------------ ------- -------------------- 

Output Datafame : 
 ---- -------------- ------ ------------ ------- -------------------- 
|vide|       integer|double|xx_dt_arrete|vide_de|            code_lei|
 ---- -------------- ------ ------------ ------- -------------------- 
|null|10000000000000|   1.1|  2021-06-30|   null|  "code_lei" "et" "chorba"|
|null|10000000000000|   1.1|  2021-06-30|   null|                null|
|null|10000000000000|   1.1|  2021-06-30|   null| "code_lei" "et" "chorba" |
|null|10000000000000|   1.1|  2021-06-30|   null|     "code_lei" "ee"    |
|null|             2|   2.2|        null|   null|            "code_lei"|
|null|             2|   2.2|        null|   null|            "code_lei"|
|null|             2|   2.2|        null|   null|            "code_lei"|
|null|             2|   2.2|        null|   null|            "code_lei"|
 ---- -------------- ------ ------------ ------- -------------------- 

CodePudding user response:

You can use lit and concat functions for this purpose.

import pyspark.sql.functions as F
df.withColumn("code_lei",F.concat(F.lit('"'),F.col('code_lei'),F.lit('"'))).show()

CodePudding user response:

Assuming you have a dataframe like this

df = spark.createDataFrame([
    ("hello there",),
    ("hello world",),
], ['text'])

 ----------- 
|       text|
 ----------- 
|hello there|
|hello world|
 ----------- 

You can then apply a chain of transformation like this

from pyspark.sql import functions as F

(df
    .withColumn('splitted', F.split('text', ' '))
    .withColumn('joined', F.array_join(F.col('splitted'), '" "'))
    .withColumn('wrapped', F.concat(F.lit('"'), F.col('joined'), F.lit('"')))
    .show()
)

 ----------- -------------- ------------- --------------- 
|       text|      splitted|       joined|        wrapped|
 ----------- -------------- ------------- --------------- 
|hello there|[hello, there]|hello" "there|"hello" "there"|
|hello world|[hello, world]|hello" "world|"hello" "world"|
 ----------- -------------- ------------- --------------- 

Or you can add them all together like this

from pyspark.sql import functions as F

(df
    .withColumn('wrapped', F.concat(F.lit('"'), F.array_join(F.split('text', ' '), '" "'), F.lit('"')))
    .show()
)
 ----------- --------------- 
|       text|        wrapped|
 ----------- --------------- 
|hello there|"hello" "there"|
|hello world|"hello" "world"|
 ----------- --------------- 
  • Related