I have dataframe, and I would like to merge the rows that has the same value in reversed columns. An example as below:
Column1 Column2
A B
B A
C D
D C
E F
Expected results:
Column1 Column2
A B
C D
E F
As the file has less than 50 lines (though I have 1000 files), I tried some codes use iterrows
as followed:
for index, row in df.iterrows():
output = []
row_rev = df[(df['Column1'] == row['Column2']) & (df['Column2'] == row['Column1'])]
row_rev_index = df[(df['Column1'] == row['Column2']) & (df['Column2'] == row['Column1'])].index()
if row_rev.any():
print(df[min([index, row_rev_index])])
output.append(df[min([index, row_rev_index])]) # always print out the first line of the reciprocal lines
but it complains that row_rev_index = df[(df['Column1'] == row['Column2']) & (df['Column2'] == row['Column1'])].index()
TypeError: 'Int64Index' object is not callable
CodePudding user response:
Change
row_rev_index = df[(df['Column1'] == row['Column2']) & (df['Column2'] == row['Column1'])].index()
to
row_rev_index = df[(df['Column1'] == row['Column2']) & (df['Column2'] == row['Column1'])].index
or even shorter
row_rev_index = row_rev.index
CodePudding user response:
This may be what you are looking for:
df = df.groupby(df.apply(lambda x: tuple(set(x)),axis=1)).first()