Reordering common columns but not others in pandas dataframe-CodePudding

I have two dataframes with hundreds of columns. Some have the same name, some do not. I want the two dataframes to have the columns with same name listed in the same order. Typically, if those were the only columns, I would do:

df2 = df2.filter(df1.columns)

However, because there are columns with different names, this would eliminate all columns in df2 that do not exists in df1.

How do I order all common columns with same order without losing the columns that are not in common? Those not in common must be kept in the original order. Because I have hundreds of columns I cannot do it manually but need a quick solution like "filter". Please, note that though there are similar questions, they do not deal with the case of "some columns are in common and some are different".

Example:

df1.columns = A,B,C,...,Z,1,2,...,1000
df2.columns = Z,K,P,T,...,01,02,...,01000

I want to reorder the columns for df2 to be:

df2.columns = A,B,C,...,Z,01,02,...,01000

CodePudding user response：

Assume you want to also keep columns that are not in common in the same place:

# make a copy of df2 column names
new_cols = df2.columns.values.copy()

# reorder common column names in df2 to be same order as df1
new_cols[df2.columns.isin(df1.columns)] = df1.columns[df1.columns.isin(df2.columns)]

# reorder columns using new_cols
df2[new_cols]

Example:

df1 = pd.DataFrame([[1,2,3,4,5]], columns=list('badfe'))
df2 = pd.DataFrame([[1,2,3,4,5]], columns=list('fsxad'))

df1
   b  a  d  f  e
0  1  2  3  4  5

df2
   f  s  x  a  d
0  1  2  3  4  5

new_cols = df2.columns.values.copy()
new_cols[df2.columns.isin(df1.columns)] = df1.columns[df1.columns.isin(df2.columns)]

df2[new_cols]
   a  s  x  d  f
0  4  2  3  5  1

CodePudding user response：

Try sets operations on column names like intersection and difference:

Setup a MRE

>>> df1
   A  B  C  D
0  2  7  7  5
1  6  8  4  2

>>> df2
   C  B  E  F
0  8  7  3  2
1  8  6  5  8

c0 = df1.columns.intersection(df2.columns)
c1 = df1.columns.difference(df2.columns)
c2 = df2.columns.difference(df1.columns)

df1 = df1[c0.tolist()   c1.tolist()]
df2 = df2[c0.tolist()   c2.tolist()]

Output:

>>> df1
   B  C  A  D
0  7  7  2  5
1  8  4  6  2

>>> df2
   B  C  E  F
0  7  8  3  2
1  6  8  5  8

CodePudding user response：

You can do using pd.Index.difference and pd.index.union

i = df1.columns.intersection(df2.columns,sort=False).union(
    df2.columns.difference(df1.columns),sort=False
     )
out = df2.loc[:,i]

df1 = pd.DataFrame(columns=list("ABCEFG"))
df2 = pd.DataFrame(columns=list("ECDAFGHI"))
print(df1)
print(df2)
i = df2.columns.intersection(df1.columns,sort=False).union(
    df2.columns.difference(df1.columns),sort=False
     )
print(df2.loc[:,i])

Empty DataFrame
Columns: [A, B, C, E, F, G]
Index: []
Empty DataFrame
Columns: [E, C, D, A, F, G, H, I]
Index: []
Empty DataFrame
Columns: [A, C, E, F, G, D, H, I]
Index: []