Home > Software engineering >  split one column into multiple columns usining delimiter
split one column into multiple columns usining delimiter

Time:02-27

i have a dataframe that have one of its filed contains multiple values separated by " ". What i want is to split each value into a new column.

import pandas as pd

df = {'ID': [3009, 129, 119, 120, 121],
  'date': ['2016 2017', '2015', '2014 2019 2020', '2020', 'NULL']
  }

df = pd.DataFrame(df)

LOOK like this:

df    

Out[25]: 
      ID       date
0    3009    2016 2017
1     129    2015
2     119    2014 2019 2020
3     120    2020
4     121    NULL

I want to split the column 'date' by the ' ' delimiter and create columns based on the number of existing delimiters then create a columns that display the number of dates for each record.

      ID     date   date2    date3  number of dates
0     3009   2016   2017     NULL     2
1     129    2015   NULL     NULL     1
2     119    2014   2019     2020     3
3     120    2020   NULL     NULL     1
4     121    NULL   NULL     NULL     0

I tried this code:

df["date"] = df.date.apply(lambda x: pd.Series(str(x).split(" ")))

but it crush and display the below error:

ValueError: Wrong number of items passed 4, placement implies 1

CodePudding user response:

Use str.split to split

df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split(' ', expand=True)

and count to count

df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)

print(df)

     ID  date date2 date3  number of dates
0  3009  2016  2017  None                2
1   129  2015  None  None                1
2   119  2014  2019  2020                3
3   120  2020  None  None                1
4   121   NaN   NaN   NaN                0
  • Related