I have a df
:
# create generic df with 1 date column and 2 value columns
df = pd.DataFrame({'date': pd.date_range('2020-01-01', '2020-01-31', freq='D'), \
'value1': np.random.randint(0, 10, 31), \
'value2': np.random.randint(0, 100, 31),\
'value3': np.random.randint(0, 1000, 31)})
I want to group by this df
by date
in W
intervals, take the average of value2
, count
of value3
and distinct values of value1
& the count of those values in this or similar format :
{9:2, 4:1, 6:2, 5:1, 3:1}
[(9, 2), (4,1), (6,2), (5,1), (3,1)]
Basically this represent that in the first week there were 2
counts of value 9
in the column value1
and so on, similar to what df.groupby(pd.Grouper(key='date', freq='W')).value1.value_counts()
returns, but trying
df.groupby(pd.Grouper(key='date', freq='W'))\
.agg({'value1': 'mean', 'value2': 'mean', 'value3': pd.Series.value_counts()})\
.reset_index()
Returns an error:
TypeError: value_counts() missing 1 required positional argument: 'self'
My desired output should look like this:
date value2 value3 value_1
2020-01-05 62.600000 5 {1:5, 3:2}
2020-01-12 30.000000 7 {2:2, 3:3, 6:1}
2020-01-19 34.428571 7 {2:2, 3:3, 6:1}
2020-01-26 51.428571 7 {2:1, 4:3, 8:1}
2020-02-02 48.000000 5 {2:1, 3:5, 7:1}
The column value1
as mentioned above can have a different format, such as a list with tuples of values.
CodePudding user response:
Convert values to dictionaries with lambda function:
df = df.groupby(pd.Grouper(key='date', freq='W'))\
.agg({'value1': 'mean', 'value2': 'mean',
'value3': lambda x: x.value_counts().to_dict()})\
.reset_index()
print (df)
date value1 value2 \
0 2020-01-05 3.200000 41.000000
1 2020-01-12 4.714286 58.714286
2 2020-01-19 4.285714 65.285714
3 2020-01-26 6.428571 68.857143
4 2020-02-02 4.000000 36.600000
value3
0 {984: 1, 920: 1, 853: 1, 660: 1, 101: 1}
1 {421: 1, 726: 1, 23: 1, 408: 1, 398: 1, 493: 1...
2 {176: 1, 209: 1, 180: 1, 566: 1, 280: 1, 570: ...
3 {49: 1, 113: 1, 327: 1, 777: 1, 59: 1, 301: 1,...
4 {113: 1, 983: 1, 181: 1, 239: 1, 839: 1}
Or use collections.Counter
:
from collections import Counter
df = df.groupby(pd.Grouper(key='date', freq='W'))\
.agg({'value1': 'mean', 'value2': 'mean', 'value3': Counter})\
.reset_index()