let's say we have the following code example where we create two basic dataframes:
import pandas as pd
# Creating Dataframes
a = [{'Name': 'abc', 'Age': 8, 'Grade': 3},
{'Name': 'xyz', 'Age': 9, 'Grade': 3}]
df1 = pd.DataFrame(a)
b = [{'ID': 1,'Name': 'abc', 'Age': 8},
{'ID': 2,'Name': 'xyz', 'Age': 9}]
df2 = pd.DataFrame(b)
# Printing Dataframes
display(df1)
display(df2)
We get the following datasets:
Name Age Grade
0 abc 8 3
1 xyz 9 3
ID Name Age
0 1 abc 8
1 2 xyz 9
How can I find the list of columns that are not repeated in these frames when they are intersected? That is, as a result, I want to get the names of the following columns: ['Grade', 'ID']
CodePudding user response:
res = df2.columns.symmetric_difference(df1.columns)
print(res)
Output
Index(['Grade', 'ID'], dtype='object')
Or as an alternative, use set.symmetric_difference
res = set(df2.columns).symmetric_difference(df1.columns)
print(res)
Output
{'Grade', 'ID'}
A third alternative, suggested by @SashSinha, is to use the shortcut:
res = df2.columns ^ df1.columns
but as of pandas 1.4.3 this issue a warning:
FutureWarning: Index.xor operating as a set operation is deprecated, in the future this will be a logical operation matching Series.xor. Use index.symmetric_difference(other) instead. res = df2.columns ^ df1.columns