Home > Enterprise >  Is there a method to check total counts of a species matches the respective number of length measure
Is there a method to check total counts of a species matches the respective number of length measure

Time:07-20

I have a table that has for each survey site and survey date, a total of the number organisms counted, and measurements for each organism found. I would like to make sure that the data is correct by making sure the total organism counted match the total number of measurements taken.

I initially tried to gather the table, changed the values to 1 or 0 if a measurement was taken, and then group_by and summarise. This method didnt work, and I am sure there is a nicer method so any help would be appreciated.

Ideally I would like a table that has site, survey data, total counts and a count column derived from summing the number of measurements taken. The idea would be that the two count columns should have the same values, and hence not be missing data.

Sample data -

structure(list(Date.of.Survey = c("12/04/2022", "16/04/2022", 
"12/04/2022", "13/04/2022", "14/04/2022", "15/04/2022"), Location = c("Wandle - Merton Abbey Mills", 
"Wandle - Merton Abbey Mills", "Medway - Allington Weir", "Medway - Allington Weir", 
"Medway - Allington Weir", "Medway - Allington Weir"), Was.the.trap.working.when.you.checked.it. = c("Yes", 
"Yes", "Yes", "Yes", "Yes", "Yes"), Number.of.eels = c(0L, 1L, 
0L, 0L, 0L, 20L), X1..Length..mm. = c("", "180", "", "", "", 
"72"), X2..Length..mm. = c("", "", "", "", "", "69"), X3..Length..mm. = c("", 
"", "", "", "", "76"), X4..Length..mm. = c("", "", "", "", "", 
"72"), X5..Length..mm. = c("", "", "", "", "", "72"), X6..Length..mm. = c("", 
"", "", "", "", "73"), X7..Length..mm. = c(NA, NA, NA, NA, NA, 
77L), X8..Length..mm. = c(NA, NA, NA, NA, NA, 78L), X9..Length..mm. = c(NA, 
NA, NA, NA, NA, 75L), X10..Length..mm. = c(NA, NA, NA, NA, NA, 
72L), X11..Length..mm. = c(NA, NA, NA, NA, NA, 75L), X12..Length..mm. = c(NA, 
NA, NA, NA, NA, 78L), X13..Length..mm. = c(NA, NA, NA, NA, NA, 
74L), X14..Length..mm. = c(NA, NA, NA, NA, NA, 70L), X15..Length..mm. = c(NA, 
NA, NA, NA, NA, 75L), X16..Length..mm. = c(NA, NA, NA, NA, NA, 
75L), X17..Length..mm. = c(NA, NA, NA, NA, NA, 73L), X18..Length..mm. = c(NA, 
NA, NA, NA, NA, 72L), X19..Length..mm. = c(NA, NA, NA, NA, NA, 
75L), X20..Length..mm. = c(NA, NA, NA, NA, NA, 71L), X21..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X22..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X23..Length..mm. = c(NA, NA, NA, NA, NA, NA), X24..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X25..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X26..Length..mm. = c(NA, NA, NA, NA, NA, NA), X27..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X28..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X29..Length..mm. = c(NA, NA, NA, NA, NA, NA), X30..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X31..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X32..Length..mm. = c(NA, NA, NA, NA, NA, NA), X33..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X34..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X35..Length..mm. = c(NA, NA, NA, NA, NA, NA), X36..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X37..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X38..Length..mm. = c(NA, NA, NA, NA, NA, NA), X39..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X40..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X41..Length..mm. = c(NA, NA, NA, NA, NA, NA), X42..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X43..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X44..Length..mm. = c(NA, NA, NA, NA, NA, NA), X45..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X46..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X47..Length..mm. = c(NA, NA, NA, NA, NA, NA), X48..Length..mm. = c(NA, 
NA, NA, NA, NA, NA), X49..Length..mm. = c(NA, NA, NA, NA, NA, 
NA), X50..Length..mm. = c(NA, NA, NA, NA, NA, NA)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))```

Thanks in advance

CodePudding user response:

You want to first make sure that your blanks are NAs. Then you can use rowSums to count the number of non-NA columns, and finally use case_when to create a variable to identify whether the count matches the number of measurements. I also recommend using janitor's clean_names function to make it a little easier to work with your variable names.

library(dplyr)
library(janitor)

df <- df %>% 
  mutate_all(na_if,"") %>% 
  mutate(count = rowSums(!is.na(select(., 5:50)))) %>% 
  mutate(count_match = case_when(number_of_eels == count ~1,
                                 TRUE ~0))


  •  Tags:  
  • r
  • Related