Home > Enterprise >  how to merge rows of a df with same value
how to merge rows of a df with same value

Time:07-07

I am facing a problem using pandas on python and i can't solve it. I would like to merge/combine/regroup the rows which have the same url.

EDIT : I have a dataframe looking like this :

url col1 col2 col3 col4
aaa xx yy
bbb zz
aaa ee
AA

I would like something like this :

url col1 col2 col3 col4
aaa ee xx yy
bbb zz cc
AA

I've tried using groupby, but in my df i've datas which don't have URL and i want to keep them. I've also tried merge with inner, which gives me pretty good results but i don't know why it decuplates the number of rows inside my df.

thank you.

CodePudding user response:

You can use groupby and first.

df = df.groupby('url', as_index=False).first()

CodePudding user response:

I think you should use groupby, nunique, and np.where to solve this issue. See the following discussion regarding this problem. pandas-dataframe-check-if-multiple-rows-have-the-same-value

CodePudding user response:

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':['A', np.nan], 'col2':[np.nan, 'B']}).set_index('url')
df2 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':[np.nan, 'C'], 'col2':['D', np.nan]}).set_index('url')
df1.fillna(df2, inplace=True)
print(df1)

Result:

     col1 col2
url           
url1    A    D
url2    C    B
  • Related