I am facing a problem using pandas on python and i can't solve it. I would like to merge/combine/regroup the rows which have the same url.
EDIT : I have a dataframe looking like this :
url | col1 | col2 | col3 | col4 |
---|---|---|---|---|
aaa | xx | yy | ||
bbb | zz | |||
aaa | ee | |||
AA |
I would like something like this :
url | col1 | col2 | col3 | col4 |
---|---|---|---|---|
aaa | ee | xx | yy | |
bbb | zz | cc | ||
AA |
I've tried using groupby, but in my df i've datas which don't have URL and i want to keep them. I've also tried merge with inner, which gives me pretty good results but i don't know why it decuplates the number of rows inside my df.
thank you.
CodePudding user response:
You can use groupby
and first
.
df = df.groupby('url', as_index=False).first()
CodePudding user response:
I think you should use groupby, nunique, and np.where
to solve this issue.
See the following discussion regarding this problem.
pandas-dataframe-check-if-multiple-rows-have-the-same-value
CodePudding user response:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':['A', np.nan], 'col2':[np.nan, 'B']}).set_index('url')
df2 = pd.DataFrame({'url': ['url1', 'url2'], 'col1':[np.nan, 'C'], 'col2':['D', np.nan]}).set_index('url')
df1.fillna(df2, inplace=True)
print(df1)
Result:
col1 col2
url
url1 A D
url2 C B