I want to create augmented data in a new dataframe for every row of an original dataframe.
So, I've defined augment method which I want to use in apply as following:
def augment(row: pd.Series, column_name: str, target_df: pd.DataFrame, num_samples: int):
# print(type(row))
target_df_start_index = target_df.shape[0]
raw_img = row[column_name].astype('uint8')
bin_image = convert_image_to_binary_image(raw_img)
bin_3dimg = tf.expand_dims(input=bin_image, axis=2)
bin_img_reshaped = tf.image.resize_with_pad(image=bin_3dimg, target_width=128, target_height=128, method="bilinear")
for i in range(num_samples 1):
new_row = row.copy(deep=True)
if i == 0:
new_row[column_name] = np.squeeze(bin_img_reshaped, axis=2)
else:
aug_image = data_augmentation0(bin_img_reshaped)
new_row[column_name] = np.squeeze(aug_image, axis=2)
# display.display(new_row)
target_df.loc[target_df_start_index i] = new_row
# print(target_df.shape)
# display.display(target_df)
When I call this as following, everything works:
tmp_df = pd.DataFrame(None, columns=testDF.columns)
augment(testDF.iloc[0], column_name='binMap', target_df=tmp_df, num_samples=4)
augment(testDF.iloc[1], column_name='binMap', target_df=tmp_df, num_samples=4)
However, when I try it using 'apply' method, I get the prints or the display working fine but the resultant dataframe shows error
tmp_df = pd.DataFrame(None, columns=testDF.columns)
testDF.apply(augment, args=('binMap', tmp_df, 4, ), axis=1)
This is how the o/p data looks like after the apply call -
,data
<Error>, <Error>
<Error>, <Error>
What am I doing wrong?
CodePudding user response:
Your test is very nice, thank you for the clear exposition. I am happy to be your rubber duck.
In test A, you (successfully) mess with
testDF.iloc[0]
and [1]
,
using kind of a Fortran-style API
for augment(), leaving a side effect result in tmp_df.
Test B is carefully constructed to
be "the same" except for the .apply()
call.
So let's see, what's different?
Hard to say.
Let's go examine the docs.
Oh, right! We're using the .apply() API, so we'd better follow it. Down at the end it explains:
Returns: Series or DataFrame
Result of applying func along the given axis of the DataFrame.
But you're offering return None
instead.
Now, I'm not here to pass judgement on
whether it's best to have side effects
on a target df
-- that's up to you.
But .apply() will be bent out of shape
until you give it something nice
to store as its own result.
Happy hunting!
CodePudding user response:
This change worked for me -
def augment(row: pd.Series, column_name: str, target_df: pd.DataFrame, num_samples: int) -> pd.Series:
# print(type(row))
target_df_start_index = target_df.shape[0]
raw_img = row[column_name].astype('uint8')
bin_image = convert_image_to_binary_image(raw_img)
bin_3dimg = tf.expand_dims(input=bin_image, axis=2)
bin_img_reshaped = tf.image.resize_with_pad(image=bin_3dimg, target_width=128, target_height=128, method="bilinear")
for i in range(num_samples 1):
new_row = row.copy(deep=True)
if i == 0:
new_row[column_name] = np.squeeze(bin_img_reshaped, axis=2)
else:
aug_image = data_augmentation0(bin_img_reshaped)
new_row[column_name] = np.squeeze(aug_image, axis=2)
# display.display(new_row)
target_df.loc[target_df_start_index i] = new_row
# print(target_df.shape)
# display.display(target_df)
return row
And updated call to apply as following:
testDF = testDF.apply(augment, args=('binMap', tmp_df, 4, ), result_type='broadcast', axis=1)
Thank you @J_H. If there are better to way to achieve what I'm doing, please feel free to suggest the improvements.