Home > OS >  Pandas - Replace cell values using a conditional (normalising string input for gender)
Pandas - Replace cell values using a conditional (normalising string input for gender)

Time:11-02

Example data

id Gender Age
1 F 22
2 Fem 18
3 male 45
4 She/Her 30
5 Male 25
6 Non-bianary 26
7 M 18
8 female 20
9 Male 56

I want to be able to standardise this somewhat by replacing all cells with an 'F' in them with 'Female', and all cells with 'M' in them with 'Male'. I know the first step is to cast the whole column into capitals

df.Gender = df.Gender.str.capitalize()

and I know that I can do it value-by-value with

df['Gender'] = df['Gender'].replace(['F', 'Fem', 'Female'], 'Female')

but is there a way to do this somewhat programmatically?

such as

df.Gender = df.Gender.str.capitalise()

for i in df.Gender:
    if 'F' in str(i):
        #pd.replace call something like...
        df[df.Gender == i] = 'Female'
        #I know that line is very wrong
    elif 'M' in str(i)...

Any help would be much appreciated.

CodePudding user response:

Try using regex:

import re

df["Gender"] = df["Gender"].str.replace(
    r"^F\S*$", "Female", flags=re.I, regex=True
)
print(df)

Prints:

   id       Gender  Age
0   1       Female   22
1   2       Female   18
2   3         male   45
3   4      She/Her   30
4   5         Male   25
5   6  Non-bianary   26
6   7            M   18
7   8       Female   20
8   9         Male   56

CodePudding user response:

Yes, you can loop through the df like that:

for indx, row in df.iterrows:
    if row["Gender"] == "F": #Or other conditions
        df.loc[index,"Gender"] = "Female"
    else:
        pass #or whatever condition u want to add

Is this what u asked for ? Although its more efficient to do like that @Andrej Kesely Answered

CodePudding user response:

df['Gender'][df['Gender'].isin(['F', 'Fem', 'Female'])] = 'Female'
  • Related