Reorder all columns with pattern in pandas-CodePudding

I have a dataframe with the following structure:

df = pd.DataFrame(
    {
        "user": ["A"],
        "country": ["US"],
        "a1": [1],
        "b1": [2],
        "a2": [2],
        "b2": [3],
        "a3": [3],
        "b3": [4],
    }
)

I want a user that pushes all the b columns to the end, obtaining something like this:

pd.DataFrame(
    {
        "user": ["A"],
        "country": ["US"],
        "a1": [1],
        "a2": [2],
        "a3": [3],
        "b1": [2],
        "b2": [3],
        "b3": [4],
    }
)

The amount of b columns can be arbitrary, and their positions too, what I want to do is have them all at the end and keep the other columns in the previous order.

Is there an easy way to do this in pandas? (and all that the b columns have in common is that they share a regex pattern, in this case containing the letter b)

CodePudding user response：

I think you can use str.match to find columns that start with 'b' and push them to end by appending them to the columns that don't start with 'b'

m = df.columns.str.match('^b')
df = df[df.columns[~m].append(df.columns[m])]

CodePudding user response：

here is one way to do it

#slice the dataframe, sort and then concat it back
pd.concat([df.loc[:,df.columns[:2]],
           df.loc[:,df.columns[2:]].sort_index(axis=1)],
          axis=1)


   user     country     a1  a2  a3  b1  b2  b3
0   A            US     1   2   3   2   3   4

CodePudding user response：

Use:

#selected columns for sorted by start b
c = df.filter(regex=r'^b').columns
#all another columns
another = df.columns.difference(c, sort=False)
#join columns and change ordering
df = df[another.union(c, sort=False)]
print (df)
  user country  a1  a2  a3   b1  b2  b3  
0    A      US   1   2   3    2   3   4

If pattern is more complicated is possible use natural sorting:

from natsort import natsorted

c = df.filter(regex=r'^b').columns

another = df.columns.difference(c, sort=False)

sorted_ = natsorted(c)

df = df[another.union(sorted_, sort=False)]
print (df)
  user country  a1  a2  a3   b1  b2  b3  
0    A      US   1   2   3    2   3   4