I have to dataframes with different indexes and some shard coulmns (coulmn name all values locations). e.g.:
data1 = {"col1": ["A", "B", "C", "D"], "col2": ["E", "F", "G", "H"], "col3": ["Z", "Z", "Z", "Z"], "col4": ["Q", "Q", "Q", "Q"]}
df1 = pd.DataFrame(index=[0,1,2,3], data=data1)
data2 = {"col1": ["A", "B", "C", "D"], "col2": ["E", "F", "G", "H"], "col3": ["Z", "Z", "Z", "X"]}
df2 = pd.DataFrame(index=[4,5,6,7], data=data2)
df1
col1 col2 col3 col4
0 A E Z Q
1 B F Z Q
2 C G Z Q
3 D H Z Q
df2
col1 col2 col3
4 A E Z
5 B F Z
6 C G Z
7 D H X
I want the output to be only the identical coulmns (name values location), even though indexing is different. In my example, It should be col1
, col2
.
CodePudding user response:
df1.loc[:, df1.eq(df2.set_axis(df1.index)).all()]
output
:
col1 col2
0 A E
1 B F
2 C G
3 D H
CodePudding user response:
First you get columns that are common by using intersection
and then see if their values are all equal.
cols = df1.columns.intersection(df2.columns)
out = [col for col in cols if (df1[col].values == df2[col].values).all()]
print(out)
['col1', 'col2']