I'm finding several answers to this question, but none that seem to address or solve the error that pops up when I apply them. Per e.g. this answer I have a dataframe df
and a function my_func(string_1,string_2)
and I'm attempting to create a new column with the following:
df.['new_column'] = df.apply(lambda x: my_func(x['old_col_1'],x['old_col_2']),axis=1)
I'm getting an error originating inside my_func
telling me that old_col_1
is type float and not a string as expected. In particular, the first line of my_func
is old_col_1 = old_col_1.lower()
, and the error is
AttributeError: 'float' object has no attribute 'lower'
By including debug statements using dataframe printouts I've verified old_col_1
and old_col_2
are indeed both strings. If I explicitly cast them to strings when passing as arguments, then my_func
behaves as you would expect if it were being fed numeric data cast as strings, though the column values are decidedly not numeric.
Per this answer I've even explicitly ensured these columns are not being "intelligently" cast incorrectly when creating the dataframe:
df = pd.read_excel(file_name, sheetname,header=0,converters={'old_col_1':str,'old_col_2':str})
The function my_func
works very well when it's called on its own. All this is making me suspect that the indices or some other numeric data from the dataframe is being passed, and not (exclusively) the column values.
Other implementations seem to give the same problem. For instance,
df['new_column'] = np.vectorize(my_func)(df['old_col_1'],df['old_col_2'])
produces the same error. Variations (e.g. using df['old_col_1'].to_numpy()
or df['old_col_1'].values
in place of df['old_col_1']
) don't change this.
CodePudding user response:
Is it possible that you have a np.nan/None/null data in your columns? If so you might be getting an error similar to the one that is caused with this data
data = {
'Column1' : ['1', '2', np.nan, '3']
}
df = pd.DataFrame(data)
df['Column1'] = df['Column1'].apply(lambda x : x.lower())
df