I have a Dataframe with various number of string. They should be 6 characters long but some mistakes have been done, so sometimes there is less. I want to add some "0" when needed to complete the length, at a very specific emplacement : after the letters.
existing :
index t0 t1 t2
0 0E315 0E16 0E17
1 BA1601
2 0A911 0A910
3 BA872 0A832 0A831
wanted :
index t0 t1 t2
0 0E0315 0E0016 0E0017
1 BA1601
2 0A0911 0A0910
3 BA0872 0A0832 0A0831
So far I tried many pandas function but no result. Anyone help?
CodePudding user response:
We can use str.replace
along with a lambda callback function:
df["t0"] = df["t0"].str.replace(r'^(.*?[A-Z] )(\d )', lambda m: m.group(1) '0'*(6-len(m.group())) m.group(2))
The logic here, taking e.g. 0E315
as an input value, is to match 0E
in the first capture group, with the digits 315
in the second capture group. We then insert, before the digits, however many zeroes are need to pad the input to a width of 6 characters.
You may use the above logic on all three of your columns.
CodePudding user response:
Assuming 'index' is the index, use str.replace
with capturing groups and a function. First group is any set of characters (as short as possible), second one exclusively digits. Then we zfill
the second groups to match a length of 6-len(m.group(1))
:
out = (df
.apply(lambda c: c.str.replace('(. ?)(\d )',
lambda m: f'{m.group(1)}{m.group(2).zfill(6-len(m.group(1)))}',
regex=True))
)
output:
t0 t1 t2
index
0 0E0315 0E0016 0E0017
1 BA1601
2 0A0911 0A0910
3 BA0872 0A0832 0A0831