I have the following dataframe:
df = pd.DataFrame({
'tmp': ['A', 'A', 'B', 'Z', 'D', 'C'],
'F1': [2, 1, 9, 8, 7, 4],
'F20': [0, 1, 9, 4, 2, 3],
'F3': ['a', 'B', 'c', 'D', 'e', 'F'],
'aabb': ['a', 'B', 'c', 'D', 'e', 'F']
})
---
tmp F1 F20 F3 aabb
0 A 2 0 a a
1 A 1 1 B B
2 B 9 9 c c
3 Z 8 4 D D
4 D 7 2 e e
5 C 4 3 F F
and I would like to sort only the columns with the F in this way:
tmp F1 F3 F20 aabb
0 A 2 a 0 a
1 A 1 B 1 B
2 B 9 c 9 c
3 Z 8 D 4 D
4 D 7 e 2 e
5 C 4 F 3 F
How could I do?
(edit) The columns with the "F" can vary both in quantity and in the values that follow the F (in my case I have about 100 columns like those) The columns with F are always grouped but the number before and after is variable
CodePudding user response:
You can use natsort
for natural sorting and a mask to handle only the F columns:
# pip install natsort
from natsort import natsorted
cols = df.columns.to_numpy(copy=True)
m = df.columns.str.fullmatch('F\d ')
cols[m] = natsorted(cols[m])
df_sorted = df[cols]
Alternative without natsort
:
num = df.columns.str.extract('F(\d )', expand=False).astype(float)
cols = df.columns.to_numpy(copy=True)
m = num.notna()
order = np.argsort(num[m])
cols[m] = cols[m][order]
df_sorted = df[cols]
output:
tmp F1 F3 F20 aabb
0 A 2 a 0 a
1 A 1 B 1 B
2 B 9 c 9 c
3 Z 8 D 4 D
4 D 7 e 2 e
5 C 4 F 3 F
CodePudding user response:
You can simply do:
df = df[["tmp", "F1", "F3", "F20", "aabb"]]
Update: sort the F columns.
# find all F columns
f_cols = df.columns[df.columns.str.startswith("F")].tolist()
# ["F1", "F20", "F3"]
# sort them using a lambda key
f_cols_sorted = sorted(f_cols, key=lambda s: int(s[1:]))
# ['F1', 'F3', 'F20']
# sort
df = df[["tmp"] f_cols_sorted ["aabb"]]
Update: arbitrary column name/number of columns before/after the F group.
# find cols of pattern `F\d `
m = [re.match(r"F\d ", s) is not None for s in cols]
# -> [False, True, True, True, False]
# find first/last index of F column
first_f = m.index(True) # 1
last_f = len(m) - m[::-1].index(True) - 1 # 3
# sort column names
sorted_cols = cols[:first_f] sorted(cols[first_f:last_f 1], key=lambda s: int(s[1:])) cols[last_f 1:]
# -> ['tmp', 'F1', 'F3', 'F20', 'aabb']
# finally
df = df[sorted_cols]
CodePudding user response:
This solves the edit issue:
df=df[["tmp"] [f"F{x}" for x in sorted([int(y[1:]) for y in df.columns if y[0]=="F"]) ] ["aabb"]]
Output:
tmp F1 F3 F20 aabb
0 A 2 a 0 a
1 A 1 B 1 B
2 B 9 c 9 c
3 Z 8 D 4 D
4 D 7 e 2 e
5 C 4 F 3 F