Home > Net >  Can you set ValueCount() to output the corresponding entry as well as the value count?
Can you set ValueCount() to output the corresponding entry as well as the value count?

Time:06-13

Say you have the following data frame and you needed to know how many Assays were done per month.

type,"Date Tested"
Assay,2022/01/28
Assay,2022/01/31
Assay,2022/02/02
Assay,2022/03/31
Assay,2022/04/21
Assay,2022/05/12
Assay,2022/06/02
Assay,2022/02/03
Assay,2022/06/03

You can use value_counts() from Pandas to easily do this.

data['Date Tested']=pd.to_datetime(data['Date Tested'], format = "%Y/%m/%d")
months = data['Date Tested'].dt.month.value_counts(sort=False)
print(months)

Which outputs:

1    2
2    2
3    1
4    1
5    1
6    2
Name: Date Tested, dtype: int64

The 'numbers' in the first column are each month (i.e 01 - Jan, 02 - Feb etc..) but this isn't great. What if the dataset started at March? Then March = 01. Or what if I needed to do the same thing but by weeks? How could you workout what, say 12 was in terms of a week?

How can you modify the output of value_count to include the corresponding month/week? This information is present in the dataframe, shown by:

print(data['Date Tested'])

Which gives:

0   2022-01-28
1   2022-01-31
2   2022-02-02
3   2022-03-31
4   2022-04-21
5   2022-05-12
6   2022-06-02
7   2022-02-03
8   2022-06-03
Name: Date Tested, dtype: datetime64[ns]

Ideally, my count output would be something like this:

2022-01   2
2022-02   2
2022-03   1
2022-04   1
2022-05   1
2022-06   2
Name: Date Tested, dtype: datetime64[ns]

CodePudding user response:

You can convert to_period (M = monthly) and count:

data.groupby(data['Date Tested'].dt.to_period('M'))['type'].count()

output:

Date Tested
2022-01    2
2022-02    2
2022-03    1
2022-04    1
2022-05    1
2022-06    2
Freq: M, Name: type, dtype: int64

By week:

data.groupby(data['Date Tested'].dt.to_period('W'))['type'].count()

output:

Date Tested
2022-01-24/2022-01-30    1
2022-01-31/2022-02-06    3
2022-03-28/2022-04-03    1
2022-04-18/2022-04-24    1
2022-05-09/2022-05-15    1
2022-05-30/2022-06-05    2
Freq: W-SUN, Name: type, dtype: int64

CodePudding user response:

Make use of indexes

>>> df.set_index(df['Date Tested'].dt.month)['Date Tested']\
      .groupby(level=0)\
      .count()

Date Tested
1    2
2    2
3    1
4    1
5    1
6    2

where here, the indexes are guaranteed to be the corresponding month of the year. For example, if you remove the first two rows of your df where the month is January, the output is

Date Tested
2    2
3    1
4    1
5    1
6    2

CodePudding user response:

grouupby date as strftime format Year-Month and count or find size

df.groupby(df['DateTested'].dt.strftime('%Y-%m')).agg('size').to_frame('count')

    

               count
DateTested       
2022-01         2
2022-02         2
2022-03         1
2022-04         1
2022-05         1
2022-06         2
  • Related