Pandas merging rows on two unique column values-CodePudding

I have problem that I have been trying to find a solution for. You would think that it wouldn't be that hard to figure out.

I have a pandas DataFrame with the below format:

   Id               Name  Now   Then There   Sold      Needed
0   1              Caden  8.1   3.40  3.95    NaN         NaN
1   7            Bankist  NaN   2.45  2.20    NaN         NaN
2   1           Artistes  8.1   3.40  3.95    NaN         NaN
0   1                NaN  NaN    NaN   NaN  33.75     670,904
1   7                NaN  NaN    NaN   NaN  33.75     670,904

I would like to have the DataFrame merge its rows based on the 'Id' column so that it looks like this:

   Id               Name  Now   Then There   Sold      Needed
0   1              Caden  8.1   3.40  3.95  33.75     670,904
1   7            Bankist  NaN   2.45  2.20  33.75     670,904
2   1           Artistes  8.1   3.40  3.95  33.75     670,904

As you can see, the 'Id' column has two Id# 1 that each have a unique 'Name'. I have not been able to figure out how to ask the question that might provide some sample code. So far I have tried different methods, and have failed, including different combinations of merge, join, and concat. The best result has lead to the current DataFrame with NaN values.

I am trying to accomplish having the 'Sold' and 'Needed' columns (which have only one value) aligned with the appropriate 'Id' row when there are repeating Ids.

CodePudding user response：

here is one way to do it

# using groupby on Id, backfill the Sold and Needed where values are null
df[['Sold','Needed']] = df.groupby(['Id'], as_index=False)[['Sold','Needed']].bfill()

# drop the rows that has Null in a name
out=df.dropna(subset='Name')

out

Id  Name    Now     Then    There   Sold    Needed
0   1   Caden   8.1     3.40    3.95    33.75   670,904
1   7   Bankist     NaN     2.45    2.20    33.75   670,904
2   1   Artistes    8.1     3.40    3.95    33.75   670,904

CodePudding user response：

Make a copy, separate the columns, drop rows where all values are nan, then merge:

*Assuming your dataframe is df1

df2=df1.copy()
df1.drop(['Sold', 'Needed'],axis=1,inplace=True)
df2.drop(['Name', 'Now', 'Then', 'There'],axis=1,inplace=True)
df1.dropna(subset="Name","Now","Then","There"],inplace=True,how='all',axis='rows')
df2.dropna(subset=["Sold","Needed"], inplace=True, how='all',axis='rows')
newdf=df1.merge(df2,how='left',left_on='Id',right_on='Id')