Home > Enterprise >  Pandas merging rows on two unique column values
Pandas merging rows on two unique column values

Time:11-17

I have problem that I have been trying to find a solution for. You would think that it wouldn't be that hard to figure out.

I have a pandas DataFrame with the below format:

   Id               Name  Now   Then There   Sold      Needed
0   1              Caden  8.1   3.40  3.95    NaN         NaN
1   7            Bankist  NaN   2.45  2.20    NaN         NaN
2   1           Artistes  8.1   3.40  3.95    NaN         NaN
0   1                NaN  NaN    NaN   NaN  33.75     670,904
1   7                NaN  NaN    NaN   NaN  33.75     670,904

I would like to have the DataFrame merge its rows based on the 'Id' column so that it looks like this:

   Id               Name  Now   Then There   Sold      Needed
0   1              Caden  8.1   3.40  3.95  33.75     670,904
1   7            Bankist  NaN   2.45  2.20  33.75     670,904
2   1           Artistes  8.1   3.40  3.95  33.75     670,904

As you can see, the 'Id' column has two Id# 1 that each have a unique 'Name'. I have not been able to figure out how to ask the question that might provide some sample code. So far I have tried different methods, and have failed, including different combinations of merge, join, and concat. The best result has lead to the current DataFrame with NaN values.

I am trying to accomplish having the 'Sold' and 'Needed' columns (which have only one value) aligned with the appropriate 'Id' row when there are repeating Ids.

CodePudding user response:

here is one way to do it

# using groupby on Id, backfill the Sold and Needed where values are null
df[['Sold','Needed']] = df.groupby(['Id'], as_index=False)[['Sold','Needed']].bfill()

# drop the rows that has Null in a name
out=df.dropna(subset='Name')

out

Id  Name    Now     Then    There   Sold    Needed
0   1   Caden   8.1     3.40    3.95    33.75   670,904
1   7   Bankist     NaN     2.45    2.20    33.75   670,904
2   1   Artistes    8.1     3.40    3.95    33.75   670,904

CodePudding user response:

Make a copy, separate the columns, drop rows where all values are nan, then merge:

*Assuming your dataframe is df1
df2=df1.copy()
df1.drop(['Sold', 'Needed'],axis=1,inplace=True)
df2.drop(['Name', 'Now', 'Then', 'There'],axis=1,inplace=True)
df1.dropna(subset="Name","Now","Then","There"],inplace=True,how='all',axis='rows')
df2.dropna(subset=["Sold","Needed"], inplace=True, how='all',axis='rows')
newdf=df1.merge(df2,how='left',left_on='Id',right_on='Id')
  • Related