I am new to using dplyr and need some help. I have a frequency (number) table (long format). Here is some sample data:
dput(test_data)
structure(list(ref = c(5L, 5L, 5L, 10L, 10L, 10L, 15L, 15L, 15L
), Sensor_Status = c("status1", "status2", "status3", "status1",
"status2", "status3", "status1", "status2", "status3"), value = c(1718L,
0L, 0L, 576L, 24L, 0L, 224L, 104L, 3L)), row.names = c(NA, -9L
), class = c("tbl_df", "tbl", "data.frame"))
I want to create a frequency distribution column and replace "value" column. For example, I want to sum all the "Sensor_Status" values when "Sensor_Status=sensor1" and divide the values column that correspond to "Sensor_Status=sensor1" . In this test data, sum(sensor1-values) will be 2518 and row1-column3 value will be 1718/2518. Sum of all frequencies of sensor1 should add to 1. Any help is appreciated. I am looking for a output table like this:
structure(list(ref = c(5L, 5L, 5L, 10L, 10L, 10L, 15L, 15L, 15L
), Sensor_Status = c("status1", "status2", "status3", "status1",
"status2", "status3", "status1", "status2", "status3"), value = c(1718L,
0L, 0L, 576L, 24L, 0L, 224L, 104L, 3L), prop = c(0.682287529785544,
0, 0, 0.228752978554408, 0.1875, 0, 0.0889594916600477, 0.8125,
1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-9L))
CodePudding user response:
Do you want:
df %>%
group_by(Sensor_Status) %>%
mutate(prop = value/sum(value)) %>%
ungroup()
# A tibble: 9 x 4
ref Sensor_Status value prop
<int> <chr> <int> <dbl>
1 5 status1 1718 0.682
2 5 status2 0 0
3 5 status3 0 0
4 10 status1 576 0.229
5 10 status2 24 0.188
6 10 status3 0 0
7 15 status1 224 0.0890
8 15 status2 104 0.812
9 15 status3 3 1
CodePudding user response:
I'm pretty confused that you keep talking about "sensor1" but I don't see that in the data--I assume you mean Sensor_Status == "status1"
.
My best guess is that you want to add two columns: 1 is the sum(value)
grouped by Sensor_Status
, and the other is each individual value
divided by the grouped sum. (If you name one of these added columns value
, it will overwrite the existing value
column. I'll keep the old column around to make it easier to see what's happening.)
library(dplyr)
test_data %>%
group_by(Sensor_Status) %>%
mutate(sum = sum(value),
proportion = value / sum) %>%
ungroup()
# A tibble: 9 × 5
ref Sensor_Status value sum proportion
<int> <chr> <int> <int> <dbl>
1 5 status1 1718 2518 0.682
2 5 status2 0 128 0
3 5 status3 0 3 0
4 10 status1 576 2518 0.229
5 10 status2 24 128 0.188
6 10 status3 0 3 0
7 15 status1 224 2518 0.0890
8 15 status2 104 128 0.812
9 15 status3 3 3 1