I want to remove string(subject to approval) after integer(90) in a dataframe.
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
print (df)
df.set_index('Name').loc['a', 'Score']
CodePudding user response:
You can use .str.extract
to extract the integer part
df['Score'] = df['Score'].astype(str).str.extract('(\d )')
print(df)
Name Score
0 a 90
1 b 80
2 c 95
3 d 20
CodePudding user response:
You can use regex to replace all non numeric characters and then cast to int.
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
df['Score'] = df['Score'].replace(r'[^0-9] ', '', regex=True)
print(df)
Output:
Name Score
0 a 90
1 b 80
2 c 95
3 d 20
If you want to only remove the extra string for rows where "Name" == "a" you can use:
import pandas as pd
name_dict = {
'Name': ['a','b','c','d'],
'Score': ['90(subject to approval)',80,95,20]
}
df = pd.DataFrame(name_dict)
df.loc[df['Name'] == 'a', 'Score'] = (
df
.loc[df['Name'] == 'a', 'Score']
.replace(r'[^0-9] ', '', regex=True)
)
CodePudding user response:
name_dict = {'Name': ['a','b','c','d'], 'Score': ['90(subject to approval)',80,95,20]}
df = pd.DataFrame(name_dict)
df["Score"] = df["Score"].apply(lambda x: x if not isinstance(x, str) or "90" not in x else x[x.index("90"): x.index("90") 2])
print(df)
Output:
Name Score
0 a 90
1 b 80
2 c 95
3 d 20