counting the frequency in python size vs count-CodePudding

I have a dataframe that looks like this:

Item       Year     
I1         2015
I2         2016
I1         2017
I2         2014

Item I2 was sold in 2016 and 2014 for example

I want to have group by Item and Year and then do what this R code does:

top_items <- data %>% select(Item, Year) %>%
  group_by(Year, Item) %>%
  summarize(sales_trend = n()) %>%
  arrange(desc(sales_trend))

Meaning, I have to have an output of the top bought items sorted.

I am trying this Python code:

b_data = pd.DataFrame(data[["Item", "Year"]].groupby(["Item", "Year"]).size()).sort_values(by=[0], ascending=False)

But I get an additional column 0, and I want to sort by it, but I don't want the column to be called 0, how to have it called sales_trend like the one in R.

And also, if I want to get this Python equivalence of the following R that completes the previous, how to do it?

...
  arrange(desc(sales_trend))
  slice_head(n = 5) %>%
  mutate(Year = as.integer(Year), rank = 1:5) %>%
  select(-sales_trend)

CodePudding user response：

df.groupby("Year")["Item"].value_counts().sort_values(ascending=True)

CodePudding user response：

With datar, a pandas wrapper that reimagines pandas APIs, we are able to translate your R code in python:

>>> from datar.all import c, f, tibble, select, group_by, summarize, arrange, desc, n
>>>
>>> data = tibble(Item=c("I1", "I2", "I1", "I2", "I2"), Year=c(2015, 2016, 2017, 2014, 2014))
>>> data
      Item    Year
  <object> <int64>
0       I1    2015
1       I2    2016
2       I1    2017
3       I2    2014
4       I2    2014  # add one more item to see if it pops up at the top
>>> top_items = (
...     data
...     >> select(f.Item, f.Year)
...     >> group_by(f.Year, f.Item)
...     >> summarize(sales_trend=n())
...     >> arrange(desc(f.sales_trend))
... )
[2022-03-17 10:15:54][datar][   INFO] `summarise()` has grouped output by ['Year'] (override with `_groups` argu
ment)
>>> top_items
     Year     Item  sales_trend
  <int64> <object>      <int64>
0    2014       I2            2
1    2015       I1            1
2    2016       I2            1
3    2017       I1            1
[TibbleGrouped: Year (n=4)]