I have a dataframe that has some missing values. I want to replace those missing values with a value from another cell in the dataframe based on a condition. So the dataframe looks like this:
x | a |
---|---|
xyz | A |
lmn | B |
None | A |
xyz | A |
qrs | C |
None | B |
What I want to do is set the value of the "None" cell to the value in column x when the values in column a match. So that it looks like this:
x | a |
---|---|
xyz | A |
lmn | B |
xyz | A |
xyz | A |
qrs | C |
lmn | B |
The index is just sequential numbers from 0 up and may change depending on the dataset so the index for the cells with the missing information will change.
CodePudding user response:
You can use ffill()
to fill forward missing values:
df['x'] = df.replace('None', np.nan).groupby('a')['x'].ffill()
print(df)
# Output:
x a
0 xyz A
1 lmn B
2 xyz A
3 xyz A
4 qrs C
5 lmn B
CodePudding user response:
for i in range(len(df)):
if df['a'][i] == 'A':
df['x'][i] = 'xyz'
This worked for me, if you want to do all the other letters, just add an elif
.