Home > Back-end >  Split an array with ; and deleted at the end ofstring if it exist to get an array
Split an array with ; and deleted at the end ofstring if it exist to get an array

Time:10-01

want to create a new column based on a string column that have as separator(";") and delete (";") in the end if exist using python/pyspark :

Inputs :

"511;520;611;"
"322;620"  
"3;321;"
"334;344"

expected Output :

 Column        |  new column
"511;520;611;" | [511,520,611]
"322;620"      | [322,620]
"3;321;"       | [3,321]
"334;344"      | [334,344]

try :

data = data.withColumn(
"newcolumn",
split(col("column"), ";"))

but i get an empty string at the end of the array like here and i want to delete it if exist

 Column        |  new column
"511;520;611;" | [511,520,611,empty string]
"322;620"      | [322,620]
"3;321;"       | [3,321,empty string]
"334;344"      | [334;344]

CodePudding user response:

Use strip() which will remove ; from the start and end of string

df.column.str.strip(";").str.split(";")

Or using apply lambda:

df.column.str.split(';').apply(lambda x: [e for e in x if e!=""])

CodePudding user response:

for spark version >= 2.4, use filter function with != '' condition to filter out empty strings in an array

from pyspark.sql.functions import expr

data = data.withColumn("newcolumn", expr("filter(split(column, ';'), x -> x != '')"))
  • Related