Home > OS >  Split pyspark dataframe column
Split pyspark dataframe column

Time:10-29

I have the below pyspark dataframe column.

Column_1
daily_trend_navigator
weekly_trend_navigator
day_of_week_trend_display
day_of_month_trend_notifier
empty_navigator
unique_notifier

I have to split the above column and only extract till trend if the column has trend as part of it or else I have to extract what ever is there before first occurence of "_"

Expected output:

column_1
daily_trend
weekly_trend
day_of_week_trend
day_of_month_trend
empty
unique

CodePudding user response:

It probably does not take into account all the cases, but at least, it works with your example.

  1. you deal with the "trend" case : split by trend if it exists
  2. you split by _ otherwise
df.withColumn(
    "Column_1",
    F.when(
        F.col("Column_1").contains("trend"),
        F.concat(F.split("Column_1", "trend").getItem(0), F.lit("trend")),
    ).otherwise(F.split("Column_1", "_").getItem(0)),
).show()
 ------------------                                                             
|          Column_1|
 ------------------ 
|       daily_trend|
|      weekly_trend|
| day_of_week_trend|
|day_of_month_trend|
|             empty|
|            unique|
 ------------------ 
  • Related