I would like to create a function to transform some specific features in a df
with the pandas
method .diff
in the different indicated periods.
I got it in a two step mode, but I am sure this can be one liner, iow, it can be simpler.
Given the following df
:
df = pd.DataFrame({"Category":["A"]*10 ["B"]*50 ["C"]*15 ["D"]*100,
"foo":[np.random.random_sample() for i in range(175)],
"bar":[np.random.random_sample() for i in range(175)]})
So by using a dict
I can set the colnames and define the diff
level I want to perform:
dict_diff={"foo":[1,2],"bar":[3,4]}
In order to set the names and transform at same time, the code I developed is:
pd.concat(map(lambda dict_items: df[dict_items[0]].diff(periods=dict_items[1]).rename(f"{dict_items[0]}_diff{dict_items[1]}"), dict_diff.items()),axis=1)
Whats missing/wrong?
I can not iterate over the value list. As dict_items[1]
is a list, there is something I need to do.
As a result:
I would get a df
with the new additional columns foo_diff1
,..., as indicated in the dict.
CodePudding user response:
You just need list comprehesion and some reduce function in a manner that you can concat with pandas:
import functools
import operator
def functools_reduce(a):
return functools.reduce(operator.concat, a)
pd.concat(functools_reduce(map(lambda dict_items: [df[dict_items[0]].diff(periods=diff_value).rename(f"{dict_items[0]}_diff{diff_value}") for diff_value in dict_items[1]], {"foo":[1,2],"bar":[3,4]}.items())),axis=1)