I recently needed to fill blank string values within a pandas
dataframe with an adjacent column for the same row.
I attempted df.apply(lambda x: x['A'].replace(...)
as well attempted np.where
. Neither worked. There were anomalies with the assignment of "blank string values", I couldn't pick them up via ''
or df['A'].replace(r'^\s$',df['B'],regex=True)
, or replacing df['B']
with e.g. -
. The only two things that worked was .isnull()
and iterrows
where they appeared as nan
.
So iterrows
worked, but I'm not saving the changes.
How is pandas
saving the changes?
mylist = {'A':['fe','fi', 'fo', ''], 'B':['fe1,','fi2', 'fi3', 'thum']}
coffee = pd.DataFrame(mylist)
print ("output1\n",coffee.head())
for index,row in coffee.iterrows():
if str(row['A']) == '':
row['A'] = row['B']
print ("output2\n", coffee.head())
output1
A B
0 fe fe1,
1 fi fi2
2 fo fi3
3 thum
output2
A B
0 fe fe1,
1 fi fi2
2 fo fi3
3 thum thum
Note The dataframe is an object BTW.
CodePudding user response:
About pandas.DataFrame.iterrows
, the documentation says :
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
In your case, you can use one of these *solutions (that should work with your real dataset as well):
coffee.loc[coffee["A"].eq(""), "A"] = coffee["B"]
Or :
coffee["A"] = coffee["B"].where(coffee["A"].eq(""), coffee["A"])
Or :
coffee["A"] = coffee["A"].replace("", None).fillna(coffee["B"])
Still a strange behaviour though that your original dataframe got updated within the loop without any re-assign. Also, not to mention that the row/Series is supposed to return a copy and not a view..