I have this data df
where Names
is a column name and below it are its data:
Names
------
23James
0Sania
4124Thomas
101Craig
8Rick
How can I return it to this:
Names
------
James
Sania
Thomas
Craig
Rick
I tried with df.strip but there are certain numbers that are still in the DataFrame.
CodePudding user response:
We can use str.replace
here with the regex pattern ^\d
, which targets leading digits.
df["Names"] = df["Names"].str.replace(r'^\d ', '')
CodePudding user response:
The answer by Tim certainly solves this but I usually feel uncomfortable using regex as I'm not proficient with it so I would approach it like this -
def removeStartingNums(s):
count = 0
for i in s:
if i.isnumeric():
count = 1
else:
break
return s[count:]
df["Names"] = df["Names"].apply(removeStartingNums)
What the function essentially does is count the number of leading characters which are numeric and then returns a string which has those starting characters sliced off
CodePudding user response:
You can also extract all characters after digits using a capture group:
df['Names'] = df['Names'].str.extract('^\d (.*)')
print(df)
# Output
Names
0 James
1 Sania
2 Thomas
3 Craig
4 Rick
Details on Regex101