If I have a dataframe like this:
import pandas as pd
df = pd.DataFrame({
'id': [1,2,3],
'date': ['2021-09-08', '2021-07-06', '2021-03-04'],
'finding': ['Yes', 'No', 'Yes'],
'unecessary_col': [1,1,1]
})
id date finding unecessary_col
0 1 2021-09-08 Yes 1
1 2 2021-07-06 No 1
2 3 2021-03-04 Yes 1
How can I create an additional 'summary' column with values and descriptions from different columns within the same row? Not all columns would be included in this summary. Ideal output below:
id date finding unecessary_col summary
0 1 2021-09-08 Yes 1 "ID: 1; Date: 2021-09-08; Finding: Yes"
1 2 2021-07-06 No 1 "ID: 2; Date: 2021-07-06; Finding: No"
2 3 2021-03-04 Yes 1 "ID: 3; Date: 2021-03-04; Finding: Yes"
Thanks in advance
CodePudding user response:
If you want a string for your summaries, you will need to perform some form of iteration, either an explicit for-loop
or an implicit one via df.apply(axis=1)
explicit for loop df.iterrows()
df['summary'] = ['; '.join(f'{k}: {v}' for k, v in row.items()) for _, row in df.iterrows()]
print(df)
id date finding summary
0 1 2021-09-08 Yes id: 1; date: 2021-09-08; finding: Yes
1 2 2021-07-06 No id: 2; date: 2021-07-06; finding: No
2 3 2021-03-04 Yes id: 3; date: 2021-03-04; finding: Yes
implicit for loop .apply(…, axis=1)
df['summary'] = df.apply(
lambda row: '; '.join(f'{k}: {v}' for k, v in row.items()),
axis=1
)
print(df)
id date finding summary
0 1 2021-09-08 Yes id: 1; date: 2021-09-08; finding: Yes
1 2 2021-07-06 No id: 2; date: 2021-07-06; finding: No
2 3 2021-03-04 Yes id: 3; date: 2021-03-04; finding: Yes
CodePudding user response:
You can use:
df['summary'] = df.to_dict(orient='records')
Result:
id date finding summary
0 1 2021-09-08 Yes {'id': 1, 'date': '2021-09-08', 'finding': 'Yes'}
1 2 2021-07-06 No {'id': 2, 'date': '2021-07-06', 'finding': 'No'}
2 3 2021-03-04 Yes {'id': 3, 'date': '2021-03-04', 'finding': 'Yes'}