Add characters on a string with condition in python-CodePudding

I have a Dataframe with various number of string. They should be 6 characters long but some mistakes have been done, so sometimes there is less. I want to add some "0" when needed to complete the length, at a very specific emplacement : after the letters.

existing :

index t0    t1  t2
0   0E315   0E16    0E17
1   BA1601      
2   0A911   0A910   
3   BA872   0A832   0A831

wanted :

index t0    t1  t2
0   0E0315  0E0016  0E0017
1   BA1601      
2   0A0911  0A0910  
3   BA0872  0A0832  0A0831

So far I tried many pandas function but no result. Anyone help?

CodePudding user response：

We can use str.replace along with a lambda callback function:

df["t0"] = df["t0"].str.replace(r'^(.*?[A-Z] )(\d )', lambda m: m.group(1)   '0'*(6-len(m.group()))   m.group(2))

The logic here, taking e.g. 0E315 as an input value, is to match 0E in the first capture group, with the digits 315 in the second capture group. We then insert, before the digits, however many zeroes are need to pad the input to a width of 6 characters.

You may use the above logic on all three of your columns.

CodePudding user response：

Assuming 'index' is the index, use str.replace with capturing groups and a function. First group is any set of characters (as short as possible), second one exclusively digits. Then we zfill the second groups to match a length of 6-len(m.group(1)):

out = (df
 .apply(lambda c: c.str.replace('(. ?)(\d )',
        lambda m: f'{m.group(1)}{m.group(2).zfill(6-len(m.group(1)))}',
       regex=True))
)

output:

           t0      t1      t2
index                        
0      0E0315  0E0016  0E0017
1      BA1601                
2      0A0911  0A0910        
3      BA0872  0A0832  0A0831