I have two dataframes with hundreds of columns. Some have the same name, some do not. I want the two dataframes to have the columns with same name listed in the same order. Typically, if those were the only columns, I would do:
df2 = df2.filter(df1.columns)
However, because there are columns with different names, this would eliminate all columns in df2 that do not exists in df1.
How do I order all common columns with same order without losing the columns that are not in common? Those not in common must be kept in the original order. Because I have hundreds of columns I cannot do it manually but need a quick solution like "filter". Please, note that though there are similar questions, they do not deal with the case of "some columns are in common and some are different".
Example:
df1.columns = A,B,C,...,Z,1,2,...,1000
df2.columns = Z,K,P,T,...,01,02,...,01000
I want to reorder the columns for df2 to be:
df2.columns = A,B,C,...,Z,01,02,...,01000
CodePudding user response:
Assume you want to also keep columns that are not in common in the same place:
# make a copy of df2 column names
new_cols = df2.columns.values.copy()
# reorder common column names in df2 to be same order as df1
new_cols[df2.columns.isin(df1.columns)] = df1.columns[df1.columns.isin(df2.columns)]
# reorder columns using new_cols
df2[new_cols]
Example:
df1 = pd.DataFrame([[1,2,3,4,5]], columns=list('badfe'))
df2 = pd.DataFrame([[1,2,3,4,5]], columns=list('fsxad'))
df1
b a d f e
0 1 2 3 4 5
df2
f s x a d
0 1 2 3 4 5
new_cols = df2.columns.values.copy()
new_cols[df2.columns.isin(df1.columns)] = df1.columns[df1.columns.isin(df2.columns)]
df2[new_cols]
a s x d f
0 4 2 3 5 1
CodePudding user response:
Try sets
operations on column names like intersection
and difference
:
Setup a MRE
>>> df1
A B C D
0 2 7 7 5
1 6 8 4 2
>>> df2
C B E F
0 8 7 3 2
1 8 6 5 8
c0 = df1.columns.intersection(df2.columns)
c1 = df1.columns.difference(df2.columns)
c2 = df2.columns.difference(df1.columns)
df1 = df1[c0.tolist() c1.tolist()]
df2 = df2[c0.tolist() c2.tolist()]
Output:
>>> df1
B C A D
0 7 7 2 5
1 8 4 6 2
>>> df2
B C E F
0 7 8 3 2
1 6 8 5 8
CodePudding user response:
You can do using pd.Index.difference
and pd.index.union
i = df1.columns.intersection(df2.columns,sort=False).union(
df2.columns.difference(df1.columns),sort=False
)
out = df2.loc[:,i]
df1 = pd.DataFrame(columns=list("ABCEFG"))
df2 = pd.DataFrame(columns=list("ECDAFGHI"))
print(df1)
print(df2)
i = df2.columns.intersection(df1.columns,sort=False).union(
df2.columns.difference(df1.columns),sort=False
)
print(df2.loc[:,i])
Empty DataFrame
Columns: [A, B, C, E, F, G]
Index: []
Empty DataFrame
Columns: [E, C, D, A, F, G, H, I]
Index: []
Empty DataFrame
Columns: [A, C, E, F, G, D, H, I]
Index: []