i have a dataframe that have one of its filed contains multiple values separated by " ". What i want is to split each value into a new column.
import pandas as pd
df = {'ID': [3009, 129, 119, 120, 121],
'date': ['2016 2017', '2015', '2014 2019 2020', '2020', 'NULL']
}
df = pd.DataFrame(df)
LOOK like this:
df
Out[25]:
ID date
0 3009 2016 2017
1 129 2015
2 119 2014 2019 2020
3 120 2020
4 121 NULL
I want to split the column 'date' by the ' ' delimiter and create columns based on the number of existing delimiters then create a columns that display the number of dates for each record.
ID date date2 date3 number of dates
0 3009 2016 2017 NULL 2
1 129 2015 NULL NULL 1
2 119 2014 2019 2020 3
3 120 2020 NULL NULL 1
4 121 NULL NULL NULL 0
I tried this code:
df["date"] = df.date.apply(lambda x: pd.Series(str(x).split(" ")))
but it crush and display the below error:
ValueError: Wrong number of items passed 4, placement implies 1
CodePudding user response:
Use str.split
to split
df[['date', 'date2', 'date3']] = df['date'].replace('NULL', np.nan).str.split(' ', expand=True)
and count
to count
df['number of dates'] = df[['date', 'date2', 'date3']].count(axis=1)
print(df)
ID date date2 date3 number of dates
0 3009 2016 2017 None 2
1 129 2015 None None 1
2 119 2014 2019 2020 3
3 120 2020 None None 1
4 121 NaN NaN NaN 0