I have a dataframe containing a pandas series (column 2) as below:
column 1 | column 2 | column 3 |
---|---|---|
1123 | Requested By = John Doe 1\n Requested On = 12 October 2021\n Comments = This is a generic request | INC29192 |
1251 | NaN | INC18217 |
1918 | Requested By = John Doe 2\n Requested On = 2 September 2021\n Comments = This is another generic request | INC19281 |
I'm struggling to extract, split and map column 2 data to a series of new column names with the appropriate data for that record (where possible, that is where there is data available as I have NaNs).
The Desired output is something like (where Ive dropped the column 3 data for legibility):
column 1 | column 3 | Requested By | Requested On | Comments |
---|---|---|---|---|
1123 | INC29192 | John Doe 1 | 12 October 2021 | This is a generic request |
1251 | INC18217 | NaN | NaN | NaN |
1918 | INC19281 | John Doe 2 | 2 September 2021 | This is another generic request |
I have spent quite some time, trying various approaches, from lambda functions to comprehensions to explode methods but havent quite found a solution that provides the desired output - any thoughts greatly appreciated!
CodePudding user response:
First I would convert column 2
values to dictionaries and then convert them to Dataframes and join them to your df:
df['column 2'] = df['column 2'].apply(lambda x:
{y.split(' = ',1)[0]:y.split(' = ',1)[1]
for y in x.split(r'\n ')}
if not pd.isna(x) else {})
df = df.join(pd.DataFrame(df['column 2'].values.tolist())).drop('column 2', axis=1)
print(df)
Output:
column 1 column 3 Requested By Requested On Comments
0 1123 INC29192 John Doe 1 12 October 2021 This is a generic request
1 1251 INC18217 NaN NaN NaN
2 1918 INC19281 John Doe 2 2 September 2021 This is another generic request