I have the below pyspark dataframe column.
Column_1
daily_trend_navigator
weekly_trend_navigator
day_of_week_trend_display
day_of_month_trend_notifier
empty_navigator
unique_notifier
I have to split the above column and only extract till trend if the column has trend as part of it or else I have to extract what ever is there before first occurence of "_"
Expected output:
column_1
daily_trend
weekly_trend
day_of_week_trend
day_of_month_trend
empty
unique
CodePudding user response:
It probably does not take into account all the cases, but at least, it works with your example.
- you deal with the "trend" case : split by trend if it exists
- you split by _ otherwise
df.withColumn(
"Column_1",
F.when(
F.col("Column_1").contains("trend"),
F.concat(F.split("Column_1", "trend").getItem(0), F.lit("trend")),
).otherwise(F.split("Column_1", "_").getItem(0)),
).show()
------------------
| Column_1|
------------------
| daily_trend|
| weekly_trend|
| day_of_week_trend|
|day_of_month_trend|
| empty|
| unique|
------------------