Home > Back-end >  how to remove string after integer in a dataframe python
how to remove string after integer in a dataframe python

Time:07-04

enter image description hereI want to remove string(subject to approval) after integer(90) in a dataframe.

import pandas as pd
name_dict = {
            'Name': ['a','b','c','d'],
            'Score': ['90(subject to approval)',80,95,20]
          }
df = pd.DataFrame(name_dict)
print (df)

df.set_index('Name').loc['a', 'Score']

CodePudding user response:

You can use .str.extract to extract the integer part

df['Score'] = df['Score'].astype(str).str.extract('(\d )')
print(df)

  Name Score
0    a    90
1    b    80
2    c    95
3    d    20

CodePudding user response:

You can use regex to replace all non numeric characters and then cast to int.

import pandas as pd
name_dict = {
            'Name': ['a','b','c','d'],
            'Score': ['90(subject to approval)',80,95,20]
          }
df = pd.DataFrame(name_dict)

df['Score'] = df['Score'].replace(r'[^0-9] ', '', regex=True)
print(df)

Output:

  Name  Score
0    a     90
1    b     80
2    c     95
3    d     20

If you want to only remove the extra string for rows where "Name" == "a" you can use:

import pandas as pd
name_dict = {
            'Name': ['a','b','c','d'],
            'Score': ['90(subject to approval)',80,95,20]
          }
df = pd.DataFrame(name_dict)

df.loc[df['Name'] == 'a', 'Score'] = (
    df
    .loc[df['Name'] == 'a', 'Score']
    .replace(r'[^0-9] ', '', regex=True)
)

CodePudding user response:

name_dict = {'Name': ['a','b','c','d'], 'Score': ['90(subject to approval)',80,95,20]}
df = pd.DataFrame(name_dict)
df["Score"] = df["Score"].apply(lambda x: x if not isinstance(x, str) or "90" not in x else x[x.index("90"): x.index("90") 2])
print(df)

Output:

  Name  Score
0    a     90
1    b     80
2    c     95
3    d     20
  • Related