Home > Software engineering >  Creating new variable using piping in R
Creating new variable using piping in R

Time:05-29

I'm trying to create a new variable confirmed_delta_perc in a list of commands (piping) but am having an issue with the variable active_delta showing it is not found. I have confirmed it is in the data frame but is not being read. It also doesn't add the new variable.

COVID %>%
  select(county, confirmed, confirmed_delta) %>%
  mutate(confirmed_delta_perc = active_delta/active * 100) %>%
  filter(confirmed_delta_perc == 32)

Error:

Error in `mutate()`:
! Problem while computing `confirmed_delta_perc =
  active_delta/active`.
Caused by error:
! object 'active_delta' not found

This is the full list of directions to including in the pipe: Using piping, create a link of commands that selects the county, confirmed, and confirmed_delta variables. Create a new variable called confirmed_delta_perc using the mutate() function. The values in this column should be the percentage of active delta cases of all active cases. Filter for all observation(s) that have a confirmed_delta_perc value of 32. Print out all observation(s).

I've tried modifing the mutate() by renaming the dataframe so it "redoes" it and adds the new variable but it doesn't work either.

There's not any observations that actually equal 32 but it still should add the variable but is not.

Does anyone have any ideas?

dput(head(COVID))

structure(list(county = c("Washington", "Fountain", "Jay", "Wabash", 
"Fayette", "Washington"), confirmed = c(620L, 737L, 930L, 1530L, 
1336L, 675L), confirmed_delta = c(18L, 12L, 11L, 49L, 19L, 29L
), deaths = c(5L, 8L, 14L, 25L, 33L, 6L), deaths_delta = c(0L, 
1L, 0L, 1L, 0L, 1L), recovered = c(0L, 0L, 0L, 0L, 0L, 0L), recovered_delta = c(0L, 
0L, 0L, 0L, 0L, 0L), active = c(615L, 729L, 918L, 1512L, 1305L, 
669L), active_delta = c(18L, 11L, 11L, 49L, 19L, 28L), active_delta_perc = c(0.0292682926829268, 
0.0150891632373114, 0.0119825708061002, 0.0324074074074074, 0.0145593869731801, 
0.0418535127055306)), row.names = c(NA, 6L), class = "data.frame")```

CodePudding user response:

For most numbers of cases, it is impossible for any portion of them to be exactly 32%. For instance what we would report 29 of 90 cases as "32%" but that's really 32.222222 which is not strictly equal to 32. So you will need to specify what range around 32 counts as a match. Here, I say anything within 0.5 of 32 on either side, from 31.5 to 32.5, is close enough.

COVID <- COVID %>%
  mutate(confirmed_delta_perc = active_delta/active * 100) %>%
  filter(abs(confirmed_delta_perc - 32) <= 0.5)

CodePudding user response:

try this:

COVID <- COVID %>%
  mutate(confirmed_delta_perc = active_delta/active * 100) %>%
  filter( round(confirmed_delta_perc, 0) == 32)

filtering by abs function as suggested by @JonSpring in the comments is better though

  • Related