Example data
id | Gender | Age |
---|---|---|
1 | F | 22 |
2 | Fem | 18 |
3 | male | 45 |
4 | She/Her | 30 |
5 | Male | 25 |
6 | Non-bianary | 26 |
7 | M | 18 |
8 | female | 20 |
9 | Male | 56 |
I want to be able to standardise this somewhat by replacing all cells with an 'F' in them with 'Female', and all cells with 'M' in them with 'Male'. I know the first step is to cast the whole column into capitals
df.Gender = df.Gender.str.capitalize()
and I know that I can do it value-by-value with
df['Gender'] = df['Gender'].replace(['F', 'Fem', 'Female'], 'Female')
but is there a way to do this somewhat programmatically?
such as
df.Gender = df.Gender.str.capitalise()
for i in df.Gender:
if 'F' in str(i):
#pd.replace call something like...
df[df.Gender == i] = 'Female'
#I know that line is very wrong
elif 'M' in str(i)...
Any help would be much appreciated.
CodePudding user response:
Try using regex:
import re
df["Gender"] = df["Gender"].str.replace(
r"^F\S*$", "Female", flags=re.I, regex=True
)
print(df)
Prints:
id Gender Age
0 1 Female 22
1 2 Female 18
2 3 male 45
3 4 She/Her 30
4 5 Male 25
5 6 Non-bianary 26
6 7 M 18
7 8 Female 20
8 9 Male 56
CodePudding user response:
Yes, you can loop through the df like that:
for indx, row in df.iterrows:
if row["Gender"] == "F": #Or other conditions
df.loc[index,"Gender"] = "Female"
else:
pass #or whatever condition u want to add
Is this what u asked for ? Although its more efficient to do like that @Andrej Kesely Answered
CodePudding user response:
df['Gender'][df['Gender'].isin(['F', 'Fem', 'Female'])] = 'Female'