Home > Enterprise >  Count all occurrences of a given text in grouped data
Count all occurrences of a given text in grouped data

Time:01-01

Part of my data looks as follows:

> q[,c(1,3)]
           Year       Language
1             1            C  
2             1              C
3             1            C  
4             1              C
5             1            C  
6             1     JavaScript
7             1            C  
8             2            C  
9             2           inny
10            2            C  
11            2           Java
12            3           Java
13            3           Java
14            3     JavaScript
15            3           Java
16            3     JavaScript
17            3           .NET
18            3           inny
19            3              R
20            3         Python
21            3           .NET
22            3         Python
23            3           Java
24            3           Java
25            3           Java
26            3           Java
27            3           Java
28            3           Java
29            3             C#
30            3            C  
31            3     JavaScript
32            3            C  
33            3     JavaScript
34            3           Java
35            3           Java
36            3         Python
37            3             C#
38            4              R
39            4              C
40            4           Java
41            4         Python
42            4            C  
43            4           .NET
44            4             C#
45            5           inny
46            5     JavaScript
47            5             C#
48            5         Python
49            5              R
50            2              C

The entire dataset named q also has other columns that are not relevant here. What I want to achieve is for each year to count the languages that occurred most often. Sometimes several languages occurred with the same highest max amount! That's why I want to list each such language.

Expected output:

    Year Language     
 1     1 C         
 2     2 C         
 3     3 Java      
 4     4 .NET      
 5     4 C         
 6     4 C#        
 7     4 C         
 8     4 Java      
 9     4 Python    
10     4 R         
11     5 C#        
12     5 inny      
13     5 JavaScript
14     5 Python    
15     5 R   

CodePudding user response:

Included "amount" column to display each languages occurrence each year, if needed.

library(tidyverse) 

df %>% 
  count(Year, Language, name = "amount") %>% 
  group_by(Year) %>% 
  slice_max(amount)

# A tibble: 15 × 3
# Groups:   Year [5]
    Year Language   amount
   <dbl> <chr>       <int>
 1     1 C               4
 2     2 C               2
 3     3 Java           11
 4     4 .NET            1
 5     4 C               1
 6     4 C#              1
 7     4 C               1
 8     4 Java            1
 9     4 Python          1
10     4 R               1
11     5 C#              1
12     5 inny            1
13     5 JavaScript      1
14     5 Python          1
15     5 R               1
> 

CodePudding user response:

Using dplyr:

q %>% group_by(Year) %>% summarise(language=names(which(table(Language)==max(table(Language)))))

output:

    Year language  
   <int> <chr>     
 1     1 C         
 2     2 C         
 3     3 Java      
 4     4 .NET      
 5     4 C         
 6     4 C#        
 7     4 C         
 8     4 Java      
 9     4 Python    
10     4 R         
11     5 C#        
12     5 inny      
13     5 JavaScript
14     5 Python    
15     5 R   

CodePudding user response:

Here is a base R variation:

apply(table(df$Language, df$Year), 2, 
      \(x) names(which(x == max(x)))) |>
  stack() |>
  `colnames<-`(c("Language", "Year"))
#>      Language Year
#> 1         C      1
#> 2         C      2
#> 3        Java    3
#> 4        .NET    4
#> 5           C    4
#> 6          C#    4
#> 7         C      4
#> 8        Java    4
#> 9      Python    4
#> 10          R    4
#> 11         C#    5
#> 12       inny    5
#> 13 JavaScript    5
#> 14     Python    5
#> 15          R    5
  • Related