When writing pipelines in Pandas I find myself writing functions like this
def replace(df, column, *args, **kwargs):
df[column] = df[column].str.replace(*args, **kwargs)
return df
def split(df, column, *args, **kwargs):
df[column] = df[column].str.split(*args, **kwargs)
return df
>>> df = pd.DataFrame(["C:\\path1", "C:\\path2", "C:\\path3"], columns=["Path"])
Path
0 C:\path1
1 C:\path2
2 C:\path3
>>> (
df
.pipe(replace, "Path", "C:\\", "D:\\", regex=False)
.pipe(split, "Path", "\\")
)
Path
0 [D:, path1]
1 [D:, path2]
2 [D:, path3]
There is clear pattern, so to avoid code repetition I wrote a function factory:
def make_pipe(func):
def wrapper(df, column, *args, **kwargs):
df[column] = func(df[column], *args, **kwargs)
return df
return wrapper
This works great for methods of Series objects, ie:
>>> isnull = make_pipe(pd.Series.isnull)
>>> isnull(df, "Path")
Path
0 False
1 False
2 False
But for the methods accessed through the str
namespace, it fails:
>>> replace = make_pipe(pd.Series.str.replace)
>>> replace(df, "Path", "C:\\", "D:\\", regex=False)
AttributeError: 'Series' object has no attribute '_inferred_dtype'
How can I get the factory to work in this case?
CodePudding user response:
.str
is exclusive to series of object dtype, it is not a class method. You can build an inplace lambda
:
replace = make_pipe(lambda x, *arg, **kwargs: x.str.replace(*arg, **kwargs))
replace(df, "Path", "C:\\", "D:\\", regex=False)
Output:
Path
0 D:\path1
1 D:\path2
2 D:\path3