Home > database >  Mutate Frequency table using dplyr- R
Mutate Frequency table using dplyr- R

Time:07-26

I am new to using dplyr and need some help. I have a frequency (number) table (long format). Here is some sample data:

dput(test_data)
structure(list(ref = c(5L, 5L, 5L, 10L, 10L, 10L, 15L, 15L, 15L
), Sensor_Status = c("status1", "status2", "status3", "status1", 
"status2", "status3", "status1", "status2", "status3"), value = c(1718L, 
0L, 0L, 576L, 24L, 0L, 224L, 104L, 3L)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))

I want to create a frequency distribution column and replace "value" column. For example, I want to sum all the "Sensor_Status" values when "Sensor_Status=sensor1" and divide the values column that correspond to "Sensor_Status=sensor1" . In this test data, sum(sensor1-values) will be 2518 and row1-column3 value will be 1718/2518. Sum of all frequencies of sensor1 should add to 1. Any help is appreciated. I am looking for a output table like this:

structure(list(ref = c(5L, 5L, 5L, 10L, 10L, 10L, 15L, 15L, 15L
), Sensor_Status = c("status1", "status2", "status3", "status1", 
"status2", "status3", "status1", "status2", "status3"), value = c(1718L, 
0L, 0L, 576L, 24L, 0L, 224L, 104L, 3L), prop = c(0.682287529785544, 
0, 0, 0.228752978554408, 0.1875, 0, 0.0889594916600477, 0.8125, 
1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-9L))

CodePudding user response:

Do you want:

df %>%
  group_by(Sensor_Status) %>%
  mutate(prop = value/sum(value)) %>%
  ungroup()

# A tibble: 9 x 4
    ref Sensor_Status value   prop
  <int> <chr>         <int>  <dbl>
1     5 status1        1718 0.682 
2     5 status2           0 0     
3     5 status3           0 0     
4    10 status1         576 0.229 
5    10 status2          24 0.188 
6    10 status3           0 0     
7    15 status1         224 0.0890
8    15 status2         104 0.812 
9    15 status3           3 1     

CodePudding user response:

I'm pretty confused that you keep talking about "sensor1" but I don't see that in the data--I assume you mean Sensor_Status == "status1".

My best guess is that you want to add two columns: 1 is the sum(value) grouped by Sensor_Status, and the other is each individual value divided by the grouped sum. (If you name one of these added columns value, it will overwrite the existing value column. I'll keep the old column around to make it easier to see what's happening.)

library(dplyr)
test_data %>%
  group_by(Sensor_Status) %>%
  mutate(sum = sum(value),
         proportion = value / sum) %>%
  ungroup()
# A tibble: 9 × 5
    ref Sensor_Status value   sum proportion
  <int> <chr>         <int> <int>      <dbl>
1     5 status1        1718  2518     0.682 
2     5 status2           0   128     0     
3     5 status3           0     3     0     
4    10 status1         576  2518     0.229 
5    10 status2          24   128     0.188 
6    10 status3           0     3     0     
7    15 status1         224  2518     0.0890
8    15 status2         104   128     0.812 
9    15 status3           3     3     1  
  • Related