How can I write an IF statement in Python Pandas to populate a blank column in my dataframe?-CodePudding

I have a column of Results (A, B, C, D or E). In that column are missing values and Not Applicables. I want to insert a new column (done) and in that column if there is a grade between A-E in the Result column, I want to insert that grade in the new column, but if its not A-E, I want to impute a grade randomly in the new column. I want to maintain the integrity of the original Result column. I can impute I think but getting the IF statement bit beforehand is the challenge. I am brand new to this (and an accountant so really have no clue) any help would be much appreciated :)

CodePudding user response：

import random
df['B'] = df['A'].fillna(random.choice(df['A'].dropna().unique().tolist()))

if you only wish to replace nan values with a random result, then this will work. Basically, we fill all the nan values with a randomly chosen result from the first column

'A' is the results column, and 'B' is the new column

CodePudding user response：

You can use fillna

from numpy import NaN
import pandas as pd
import random

choices = ["A", "B", "C", "D", "E"]

df = pd.DataFrame()
df['Result'] = ["A", NaN, "B", "C", NaN]
df['NewColumn'] = df['Result'].fillna(random.choices(choices)[0])
df

Output:

Result  NewColumn
0   A   A
1   NaN E
2   B   B
3   C   C
4   NaN E

CodePudding user response：

@Nicholas answer covers that. I would add that you can also use mapping in case when your na values are somethings else than real na's like for example a string "NotAGrade".

Generally, you could use map like so:

# Mapping titles in titanic dataset
grade_mapping = {"NotAGrade":numpy.random.randint(100)}
df['result_formated'] = df['result'].map(grade_mapping)

Substitute titles mapping and values for desired values.

Also check out this question on how you can use pandas.where and numpy.where in other general situations.