Home > Software design >  Merge column data from some rows with Pandas
Merge column data from some rows with Pandas

Time:09-29

I have a CSV file and I want to merge the data of some columns from some rows.

For example I have :

Name ID Data Parent
Name1 100 Blue 001
Name1 101 Grey 001
Name1 102 Grey 001
Name1 103 Black 001
Name2 200 Red 002
Name2 201 Green 002

And I would like to have :

Name ID Data Parent
Name1 100 Blue 001
Name1 101 Grey 001
Name1 102 Grey 001
Name1 103 Black 001
Name2 200 Red 002
Name2 201 Green 002
Name1 001 Blue,Grey,Black
Name2 002 Red,Green

My file have 62,000 rows. I can change some headers, add/delete/modify columns and all in less than 5 seconds. But on this part I'm getting a bit stuck. Do you have any idea how to do this?

Thank you for your help !

CodePudding user response:

You can get the list of unique colors per parent with groupby.unique, and then make them a single value with .str.join:

>>> parents = df.groupby(['Name', 'Parent'])['Data'].unique().str.join(',')
>>> parents
Name    Parent
Name1   001       Blue,Grey,Black
Name2   002             Red,Green
Name: Data, dtype: object

Then some tweaking to make it fit the initial dataframe’s format, and then we can simply append it:

>>> parents = parents.reset_index().rename(columns={'Parent': 'ID'}).assign(Parent='')
>>> parents
     Name   ID             Data Parent
0  Name1   001  Blue,Grey,Black       
1  Name2   002        Red,Green       
>>> df.append(parents)
     Name   ID             Data Parent
0  Name1   100             Blue    001
1  Name1   101             Grey    001
2  Name1   102             Grey    001
3  Name1   103            Black    001
4  Name2   200              Red    002
5  Name2   201            Green    002
0  Name1   001  Blue,Grey,Black       
1  Name2   002        Red,Green       

CodePudding user response:

You can use groupby agg to compute the new rows, then pandas.concat to merge it with the original dataframe:

pd.concat([df,
           (df.groupby('Parent')
              .agg({'Name': 'first', 'Parent': 'first', 'Data': ','.join})
              .rename(columns={'Parent': 'ID'})
              .reset_index(drop=True)
           )
          ]).fillna('') # optional fillna to remove the NaNs
    Name   ID                  Data Parent
0  Name1  100                  Blue    001
1  Name1  101                  Grey    001
2  Name1  102                  Grey    001
3  Name1  103                 Black    001
4  Name2  200                   Red    002
5  Name2  201                 Green    002
0  Name1  001  Blue,Grey,Grey,Black       
1  Name2  002             Red,Green       

CodePudding user response:

please use merge command in pandas

df1.merge(df2, left_on='lkey', right_on='rkey')

In pandas merge these merge types are available

left’, ‘right’, ‘outer’, ‘inner’, ‘cross’

Read this link for further details

  • Related