Home > database >  Apply function to two columns of a Pandas dataframe
Apply function to two columns of a Pandas dataframe

Time:05-28

I'm finding several answers to this question, but none that seem to address or solve the error that pops up when I apply them. Per e.g. this answer I have a dataframe df and a function my_func(string_1,string_2) and I'm attempting to create a new column with the following:

df.['new_column'] = df.apply(lambda x: my_func(x['old_col_1'],x['old_col_2']),axis=1)

I'm getting an error originating inside my_func telling me that old_col_1 is type float and not a string as expected. In particular, the first line of my_func is old_col_1 = old_col_1.lower(), and the error is

AttributeError: 'float' object has no attribute 'lower'

By including debug statements using dataframe printouts I've verified old_col_1 and old_col_2 are indeed both strings. If I explicitly cast them to strings when passing as arguments, then my_func behaves as you would expect if it were being fed numeric data cast as strings, though the column values are decidedly not numeric.

Per this answer I've even explicitly ensured these columns are not being "intelligently" cast incorrectly when creating the dataframe:

df = pd.read_excel(file_name, sheetname,header=0,converters={'old_col_1':str,'old_col_2':str})

The function my_func works very well when it's called on its own. All this is making me suspect that the indices or some other numeric data from the dataframe is being passed, and not (exclusively) the column values.

Other implementations seem to give the same problem. For instance,

df['new_column'] = np.vectorize(my_func)(df['old_col_1'],df['old_col_2'])

produces the same error. Variations (e.g. using df['old_col_1'].to_numpy() or df['old_col_1'].values in place of df['old_col_1']) don't change this.

CodePudding user response:

Is it possible that you have a np.nan/None/null data in your columns? If so you might be getting an error similar to the one that is caused with this data

data = {
    'Column1' : ['1', '2', np.nan, '3']
}
df = pd.DataFrame(data)
df['Column1'] = df['Column1'].apply(lambda x : x.lower())
df
  • Related