I have a df,you can have it by run the following code:
import pandas as pd
from io import StringIO
from functools import lru_cache
df = """
contract EndDate
A00118 123456
A00118 12345
"""
df = pd.read_csv(StringIO(df.strip()), sep='\s ')
The output is:
contract EndDate
0 A00118 123456
1 A00118 12345
Then I applied a logic to each row:
def var_func(row,n):
res=row['EndDate']*100*n
return res
df['annfact'] = df.apply(lambda row: var_func(row,10), axis=1)
output is:
contract EndDate annfact
0 A00118 123456 123456000
1 A00118 12345 12345000
However if I apply the python lru_cache on this function:
@lru_cache(maxsize = None)
def var_func(row,n):
res=row['EndDate']*100*n
return res
df['annfact'] = df.apply(lambda row: var_func(row,10), axis=1)
error:
TypeError: ("'Series' objects are mutable, thus they cannot be hashed", 'occurred at index 0')
Any friend can help?I want to apply python lru_cache to pd.apply function.Due to some reason I have to only use pd.apply function ,but not vectorize numpy method.
CodePudding user response:
From the docs:
Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable.
With df.apply(..., axis=1)
, you're passing a row (which is a Series object) which is not hashable, so you get the error.
One way to get around the issue is to apply var_func
on a column:
@lru_cache(maxsize = None)
def var_func(row, n):
return row*100*n
df['annfact'] = df['EndDate'].apply(var_func, n=10)
For your specific example, it's better to use vectorized operations:
df['annfact'] = df['EndDate']*100*n
We could also convert each row to something hashable. Since you want to keep referencing the column names, we could use collections.namedtuple
:
@lru_cache(maxsize = None)
def var_func(row, n):
res=row.EndDate*100*n
return res
from collections import namedtuple
df_as_ntup = namedtuple('df_as_ntup', df.columns)
df['annfact'] = df.apply(lambda row: var_func(df_as_ntup(*row), 10), axis=1)
Output:
contract EndDate annfact
0 A00118 123456 123456000
1 A00118 12345 12345000