I'm trying to find the most efficient way to fill in cells with a value of 'None' in a dataframe based on a rule.
If column 'A' has a specific value, such as 'Apple', and column 'B' is empty, I want to find all other instances of column 'A' with value 'Apple' and fill it with whatever is in column 'B's if it is not empty.
For example, if the input is this:
Column A | Column B |
---|---|
Apple | None |
Apple | None |
Orange | Soda |
Banana | None |
Apple | Pie |
Banana | Bread |
Orange | None |
Then it should output this:
Column A | Column B |
---|---|
Apple | Pie |
Apple | Pie |
Orange | Soda |
Banana | Bread |
Apple | Pie |
Banana | Bread |
Orange | Soda |
You can assume that for a particular Column A & B pair, it will always be the same (e.g. for every 'Apple' in Column A, there will either be 'None' or 'Pie' in Column B).
I tried the below, and it seems to work on this small test dataset, but I'm wondering if someone could suggest a more efficient method that I could use on my actual dataset (~100K rows).
for ind in data.index:
if data['Column B'][ind]=='None':
temp_A = data['Column A'][ind]
for ind2 in data.index:
if (data['Column A'][ind2]==temp_A) & (data['Column B'][ind2]!='None'):
data['Column B'][ind] = data['Column B'][ind2]
break
CodePudding user response:
Filter rows with no None
values, remove duplicates and convert to Series
, last mapping None
values by Series.map
:
m = df['Column B'].ne('None')
s = df[m].drop_duplicates('Column A').set_index('Column A')['Column B']
df.loc[~m, 'Column B'] = df.loc[~m, 'Column A'].map(s)
print (df)
Column A Column B
0 Apple Pie
1 Apple Pie
2 Orange Soda
3 Banana Bread
4 Apple Pie
5 Banana Bread
6 Orange Soda