I have a column of Results (A, B, C, D or E). In that column are missing values and Not Applicables. I want to insert a new column (done) and in that column if there is a grade between A-E in the Result column, I want to insert that grade in the new column, but if its not A-E, I want to impute a grade randomly in the new column. I want to maintain the integrity of the original Result column. I can impute I think but getting the IF statement bit beforehand is the challenge. I am brand new to this (and an accountant so really have no clue) any help would be much appreciated :)
CodePudding user response:
import random
df['B'] = df['A'].fillna(random.choice(df['A'].dropna().unique().tolist()))
if you only wish to replace nan values with a random result, then this will work. Basically, we fill all the nan values with a randomly chosen result from the first column
'A' is the results column, and 'B' is the new column
CodePudding user response:
You can use fillna
from numpy import NaN
import pandas as pd
import random
choices = ["A", "B", "C", "D", "E"]
df = pd.DataFrame()
df['Result'] = ["A", NaN, "B", "C", NaN]
df['NewColumn'] = df['Result'].fillna(random.choices(choices)[0])
df
Output:
Result NewColumn
0 A A
1 NaN E
2 B B
3 C C
4 NaN E
CodePudding user response:
@Nicholas answer covers that. I would add that you can also use mapping in case when your na values are somethings else than real na's like for example a string "NotAGrade".
Generally, you could use map like so:
# Mapping titles in titanic dataset
grade_mapping = {"NotAGrade":numpy.random.randint(100)}
df['result_formated'] = df['result'].map(grade_mapping)
Substitute titles mapping and values for desired values.
Also check out this question on how you can use pandas.where and numpy.where in other general situations.