Home > database >  R: Loop through all unique values and count them
R: Loop through all unique values and count them

Time:01-16

I have a dataset with staff information. I have a column that lists their current age and a column that lists their salary. I want to create an R data frame that has 3 columns: one to show all the unique ages, one to count the number of people who are that age and one to give me the median salary for each particular age. On top of this, I would like to group those who are under 21 and over 65. Ideally it would look like this:

age number of people median salary
Under 21 36 26,300
22 15 26,300
23 30 27,020
24 41 26,300
etc
Over65 47 39,100

The current dataset has hundreds of columns and thousands of rows but the columns that are of interest are like this:

ageyears sal22
46 28,250
32 26,300
19 27,020
24 26,300
53 36,105
47 39,100
47 26,200
70 69,500
68 75,310

I'm a bit lost on the best way to do this but assume some sort of loop would work best? Thanks so much for any direction or help.

CodePudding user response:

library(tidyverse) 

sample_data <- tibble(
  age = sample(17:70, 100, replace = TRUE) %>% as.character(), 
  salary = sample(20000:90000, 100, replace = TRUE)
)

# A tibble: 100 × 2
   age   salary
   <chr>  <int>
 1 56     35130
 2 56     44203
 3 20     28701
 4 47     66564
 5 66     60823
 6 54     36755
 7 66     30731
 8 68     21338
 9 19     80875
10 61     44547
# … with 90 more rows
# ℹ Use `print(n = ...)` to see more rows

sample_data %>%  
  mutate(age = case_when(age <= 21 ~ "Under 21", 
                         age >= 65 ~ "Over 65", 
                         TRUE ~ age)) %>% 
  group_by(age) %>%  
  summarise(count = n(), 
            median_salary = median(salary)) 

# A tibble: 38 × 3
   age   count median_salary
   <chr> <int>         <dbl>
 1 22        4        46284.
 2 23        3        55171 
 3 25        3        74545 
 4 27        1        37052 
 5 28        3        66006 
 6 29        1        82877 
 7 30        2        40342.
 8 31        2        27815 
 9 32        1        32282 
10 33        3        64523 
# … with 28 more rows
# ℹ Use `print(n = ...)` to see more rows
  • Related