how to merge rows of a df with same value-CodePudding

I am facing a problem using pandas on python and i can't solve it. I would like to merge/combine/regroup the rows which have the same url.

EDIT : I have a dataframe looking like this :

url	col1	col2	col3	col4
aaa		xx	yy
bbb	zz
aaa	ee
				AA

I would like something like this :

url	col1	col2	col3	col4
aaa	ee	xx	yy
bbb	zz			cc
				AA

I've tried using groupby, but in my df i've datas which don't have URL and i want to keep them. I've also tried merge with inner, which gives me pretty good results but i don't know why it decuplates the number of rows inside my df.

thank you.

CodePudding user response：

You can use groupby and first.

df = df.groupby('url', as_index=False).first()

CodePudding user response：

I think you should use groupby, nunique, and np.where to solve this issue. See the following discussion regarding this problem. pandas-dataframe-check-if-multiple-rows-have-the-same-value

CodePudding user response：

import pandas as pd
import numpy as np

df1 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':['A', np.nan], 'col2':[np.nan, 'B']}).set_index('url')
df2 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':[np.nan, 'C'], 'col2':['D', np.nan]}).set_index('url')
df1.fillna(df2, inplace=True)
print(df1)

Result:

     col1 col2
url           
url1    A    D
url2    C    B