Home > Software engineering >  How reproduce column in DF as required as new column and the sort
How reproduce column in DF as required as new column and the sort

Time:10-01

I have a column in a dataframe which is str type and holds alpha/numeric, these strings always starts with alphabet and may or may not ends alpha. these strings are separated by "." maximum length of numeric between "." is two or one digit in the input column. I want to rephrase the input by converting "." with "_" and all numeric between should be two digits and all alpha should be in uppercase also output column should be sorted by first alpha and then numbers . Could someone please help me getting desired output?

Input column:

Col
H.14.01.2
H.14.01.11
H.14.2
H.14.01.12
H.14.01.20
H.14.02.02
H.14.02.J
H.14.01.1
H.14.01.A
H.14.01.11.1
H.14.01.12.b

Required output:

Col Required
H_14_01_01
H_14_01_02
H_14_01_11
H_14_01_11_01
H_14_01_12
H_14_01_12_B
H_14_01_20
H_14_01_A
H_14_02
H_14_02_02
H_14_02_J

CodePudding user response:

You can use string operations and natsort:

# pip install natsort
from natsort import natsort_key

out = (df
 .assign(Col=df['Col'].str.replace('.', '_', regex=False)
                      .str.upper()
                      .str.replace(r'(?<=\D)(\d)(?=\D|$)', r'0', regex=True))
 .sort_values(by='Col', key=natsort_key)
 )

Output:

              Col
7      H_14_01_01
0      H_14_01_02
1      H_14_01_11
9   H_14_01_11_01
3      H_14_01_12
10   H_14_01_12_B
4      H_14_01_20
8       H_14_01_A
2         H_14_02
5      H_14_02_02
6       H_14_02_J
  • Related