Home > Software design >  diff in different periods and variable
diff in different periods and variable

Time:10-21

I would like to create a function to transform some specific features in a df with the pandas method .diff in the different indicated periods.

I got it in a two step mode, but I am sure this can be one liner, iow, it can be simpler.

Given the following df:

df = pd.DataFrame({"Category":["A"]*10 ["B"]*50 ["C"]*15 ["D"]*100,
             "foo":[np.random.random_sample() for i in range(175)],
             "bar":[np.random.random_sample() for i in range(175)]})

So by using a dict I can set the colnames and define the diff level I want to perform:

dict_diff={"foo":[1,2],"bar":[3,4]}

In order to set the names and transform at same time, the code I developed is:

pd.concat(map(lambda dict_items: df[dict_items[0]].diff(periods=dict_items[1]).rename(f"{dict_items[0]}_diff{dict_items[1]}"), dict_diff.items()),axis=1)

Whats missing/wrong?

I can not iterate over the value list. As dict_items[1] is a list, there is something I need to do.

As a result:

I would get a df with the new additional columns foo_diff1,..., as indicated in the dict.

CodePudding user response:

You just need list comprehesion and some reduce function in a manner that you can concat with pandas:

import functools
import operator
 
def functools_reduce(a):
    return functools.reduce(operator.concat, a)

pd.concat(functools_reduce(map(lambda dict_items: [df[dict_items[0]].diff(periods=diff_value).rename(f"{dict_items[0]}_diff{diff_value}") for diff_value in dict_items[1]], {"foo":[1,2],"bar":[3,4]}.items())),axis=1)
  • Related