Home > database >  Reorder columns in pandas datafram
Reorder columns in pandas datafram

Time:09-03

I have the following dataframe:

df = pd.DataFrame({
   'tmp': ['A', 'A', 'B', 'Z', 'D', 'C'],
   'F1': [2, 1, 9, 8, 7, 4],
   'F20': [0, 1, 9, 4, 2, 3],
   'F3': ['a', 'B', 'c', 'D', 'e', 'F'],
   'aabb': ['a', 'B', 'c', 'D', 'e', 'F']
})
---
  tmp  F1  F20 F3 aabb
0   A   2    0  a    a
1   A   1    1  B    B
2   B   9    9  c    c
3   Z   8    4  D    D
4   D   7    2  e    e
5   C   4    3  F    F

and I would like to sort only the columns with the F in this way:

   tmp  F1  F3  F20 aabb
0   A   2    a   0    a
1   A   1    B   1    B
2   B   9    c   9    c
3   Z   8    D   4    D
4   D   7    e   2    e
5   C   4    F   3    F

How could I do?

(edit) The columns with the "F" can vary both in quantity and in the values that follow the F (in my case I have about 100 columns like those) The columns with F are always grouped but the number before and after is variable

CodePudding user response:

You can use natsort for natural sorting and a mask to handle only the F columns:

# pip install natsort
from natsort import natsorted

cols = df.columns.to_numpy(copy=True)
m = df.columns.str.fullmatch('F\d ')
cols[m] = natsorted(cols[m])

df_sorted = df[cols]

Alternative without natsort:

num = df.columns.str.extract('F(\d )', expand=False).astype(float)
cols = df.columns.to_numpy(copy=True)
m = num.notna()
order = np.argsort(num[m])
cols[m] = cols[m][order]

df_sorted = df[cols]

output:

  tmp  F1 F3  F20 aabb
0   A   2  a    0    a
1   A   1  B    1    B
2   B   9  c    9    c
3   Z   8  D    4    D
4   D   7  e    2    e
5   C   4  F    3    F

CodePudding user response:

You can simply do:

df = df[["tmp", "F1", "F3", "F20", "aabb"]]

Update: sort the F columns.

# find all F columns
f_cols = df.columns[df.columns.str.startswith("F")].tolist()
# ["F1", "F20", "F3"]

# sort them using a lambda key
f_cols_sorted = sorted(f_cols, key=lambda s: int(s[1:]))
# ['F1', 'F3', 'F20']

# sort
df = df[["tmp"]   f_cols_sorted   ["aabb"]]

Update: arbitrary column name/number of columns before/after the F group.

# find cols of pattern `F\d `
m = [re.match(r"F\d ", s) is not None for s in cols]
# -> [False, True, True, True, False]

# find first/last index of F column
first_f = m.index(True)                    # 1
last_f = len(m) - m[::-1].index(True) - 1  # 3

# sort column names
sorted_cols = cols[:first_f]   sorted(cols[first_f:last_f 1], key=lambda s: int(s[1:]))   cols[last_f 1:]
# -> ['tmp', 'F1', 'F3', 'F20', 'aabb']

# finally
df = df[sorted_cols]

CodePudding user response:

This solves the edit issue:

df=df[["tmp"] [f"F{x}" for x in sorted([int(y[1:]) for y in  df.columns if y[0]=="F"]) ] ["aabb"]]

Output:

  tmp  F1 F3  F20 aabb
0   A   2  a    0    a
1   A   1  B    1    B
2   B   9  c    9    c
3   Z   8  D    4    D
4   D   7  e    2    e
5   C   4  F    3    F
  • Related