Home > Net >  Splitting a column to create new columns
Splitting a column to create new columns

Time:08-10

I have a dataframe df and a column LongColumn that that has rows like this:

ABC.DEF.GHI.Address.Street.Phone.Email.Category1.Beta.MyResults.New
ABC.DEG.GGI.Address.Postcode.Phone.Email.Category2.Alpha.MyResults.Old
ABC.DEG.GGI.JFK.Address.Postcode.Phone.Email.Category3.Alpha.MyResults.Old
DEG.III.JFK.Address.Postcode.Phone.Email.Category2.Beta.MyResults.Old

I am only interested in the rows that contain MyResults I want to take the three parts Category1.Beta.MyResults, Category2.Alpha.MyResults etc. and make three columns out of them, but since there is a different number of "words" in every string, I want to take them bottom-up:

from pyspark.sql import functions as F
out_df = df.withColumn('class', split(df['LongColumn'], '\.').getItem(-3)).withColumn('object', split(df['LongColumn'], '\.').getItem(-2)).withColumn('title', split(df['LongColumn'], '\.').getItem(-1))

But this leads to empty new columns, so the syntax might be wrong. How do I take them bottom up?

CodePudding user response:

Use expr() to select the elements within the array dynamically - (length of array - 4), (length of array - 3), (length of array - 2).

data_sdf. \
    filter(func.upper(func.col('long_column')).like('%MYRESULT%')). \
    withColumn('long_col_arr', func.split('long_column', '\.')). \
    withColumn('categories', func.expr('long_col_arr[size(long_col_arr) - 4]')). \
    withColumn('letters', func.expr('long_col_arr[size(long_col_arr) - 3]')). \
    withColumn('results', func.expr('long_col_arr[size(long_col_arr) - 2]')). \
    show(truncate=False)

#  ---------------------------------------------------------------------- ---------------------------------------------------------------------------------- ---------- ------- --------- 
# |long_column                                                           |long_col_arr                                                                      |categories|letters|results  |
#  ---------------------------------------------------------------------- ---------------------------------------------------------------------------------- ---------- ------- --------- 
# |ABC.DEF.GHI.Address.Street.Phone.Email.Category1.Beta.MyResults.New   |[ABC, DEF, GHI, Address, Street, Phone, Email, Category1, Beta, MyResults, New]   |Category1 |Beta   |MyResults|
# |ABC.DEG.GGI.Address.Postcode.Phone.Email.Category2.Alpha.MyResults.Old|[ABC, DEG, GGI, Address, Postcode, Phone, Email, Category2, Alpha, MyResults, Old]|Category2 |Alpha  |MyResults|
#  ---------------------------------------------------------------------- ---------------------------------------------------------------------------------- ---------- ------- --------- 
  • Related