Home > Enterprise >  Remove leading words pandas
Remove leading words pandas

Time:02-23

I have this data df where Names is a column name and below it are its data:

Names
------
23James
0Sania
4124Thomas
101Craig
8Rick

How can I return it to this:

Names
------
James
Sania
Thomas
Craig
Rick

I tried with df.strip but there are certain numbers that are still in the DataFrame.

CodePudding user response:

We can use str.replace here with the regex pattern ^\d , which targets leading digits.

df["Names"] = df["Names"].str.replace(r'^\d ', '')

CodePudding user response:

The answer by Tim certainly solves this but I usually feel uncomfortable using regex as I'm not proficient with it so I would approach it like this -

def removeStartingNums(s):
  count = 0
  for i in s:
    if i.isnumeric():
      count  = 1
    else:
      break
  return s[count:]
 
df["Names"] = df["Names"].apply(removeStartingNums)

What the function essentially does is count the number of leading characters which are numeric and then returns a string which has those starting characters sliced off

CodePudding user response:

You can also extract all characters after digits using a capture group:

df['Names'] = df['Names'].str.extract('^\d (.*)')
print(df)

# Output
    Names
0   James
1   Sania
2  Thomas
3   Craig
4    Rick

Details on Regex101

  • Related