I have a CSV file and I want to merge the data of some columns from some rows.
For example I have :
Name | ID | Data | Parent |
---|---|---|---|
Name1 | 100 | Blue | 001 |
Name1 | 101 | Grey | 001 |
Name1 | 102 | Grey | 001 |
Name1 | 103 | Black | 001 |
Name2 | 200 | Red | 002 |
Name2 | 201 | Green | 002 |
And I would like to have :
Name | ID | Data | Parent |
---|---|---|---|
Name1 | 100 | Blue | 001 |
Name1 | 101 | Grey | 001 |
Name1 | 102 | Grey | 001 |
Name1 | 103 | Black | 001 |
Name2 | 200 | Red | 002 |
Name2 | 201 | Green | 002 |
Name1 | 001 | Blue,Grey,Black | |
Name2 | 002 | Red,Green |
My file have 62,000 rows. I can change some headers, add/delete/modify columns and all in less than 5 seconds. But on this part I'm getting a bit stuck. Do you have any idea how to do this?
Thank you for your help !
CodePudding user response:
You can get the list of unique colors per parent with groupby.unique
, and then make them a single value with .str.join
:
>>> parents = df.groupby(['Name', 'Parent'])['Data'].unique().str.join(',')
>>> parents
Name Parent
Name1 001 Blue,Grey,Black
Name2 002 Red,Green
Name: Data, dtype: object
Then some tweaking to make it fit the initial dataframe’s format, and then we can simply append it:
>>> parents = parents.reset_index().rename(columns={'Parent': 'ID'}).assign(Parent='')
>>> parents
Name ID Data Parent
0 Name1 001 Blue,Grey,Black
1 Name2 002 Red,Green
>>> df.append(parents)
Name ID Data Parent
0 Name1 100 Blue 001
1 Name1 101 Grey 001
2 Name1 102 Grey 001
3 Name1 103 Black 001
4 Name2 200 Red 002
5 Name2 201 Green 002
0 Name1 001 Blue,Grey,Black
1 Name2 002 Red,Green
CodePudding user response:
You can use groupby
agg
to compute the new rows, then pandas.concat
to merge it with the original dataframe:
pd.concat([df,
(df.groupby('Parent')
.agg({'Name': 'first', 'Parent': 'first', 'Data': ','.join})
.rename(columns={'Parent': 'ID'})
.reset_index(drop=True)
)
]).fillna('') # optional fillna to remove the NaNs
Name ID Data Parent
0 Name1 100 Blue 001
1 Name1 101 Grey 001
2 Name1 102 Grey 001
3 Name1 103 Black 001
4 Name2 200 Red 002
5 Name2 201 Green 002
0 Name1 001 Blue,Grey,Grey,Black
1 Name2 002 Red,Green
CodePudding user response:
please use merge command in pandas
df1.merge(df2, left_on='lkey', right_on='rkey')
In pandas merge these merge types are available
left’, ‘right’, ‘outer’, ‘inner’, ‘cross’
Read this link for further details