I am trying to check if multiple columns of a data frame have valid percentages. That is, no negative numbers or numbers greater than one.
I have provided an example of my data below using the dput() function.
structure(list(fightName = c("UFC Fight Night: Makhachev vs. Moises",
"UFC Fight Night: Makhachev vs. Moises", "UFC Fight Night: Makhachev vs. Moises",
"UFC Fight Night: Makhachev vs. Moises", "UFC Fight Night: Makhachev vs. Moises",
"UFC Fight Night: Makhachev vs. Moises"), redFighterName = c("Alan Baudot",
"Francisco Figueiredo", "Amanda Lemos", "Daniel Rodriguez", "Khalid Taha",
"Gabriel Benitez"), redFighterHead = c(0.75, 0.57, 0.85, 0.8,
0.27, 0.66), redFighterBody = c(0.16, 0.25, 0.14, 0.04, 0.36,
0.22), redFighterLeg = c(0.08, 0.17, 0, 0.15, 0.36, 0.1), redFighterDistance = c(0.6,
0.64, 0.85, 0.84, 0.9, 0.77), redFighterClinch = c(0.31, 0.14,
0, 0.02, 0.09, 0.1), redFighterGround = c(0.08, 0.21, 0.14, 0.13,
0, 0.12), redFighterResult = c("W", "W", "W", "W", "W", "W"),
blueFighterName = c("Rodrigo Nascimento", "Malcolm Gordon",
"Montserrat Conejo", "Preston Parsons", "Sergey Morozov",
"Billy Quarantillo"), blueFighterHead = c(0.83, 0.86, 0.66,
0.6, 0.9, 0.73), blueFighterBody = c(0.12, 0.04, 0.33, 0.17,
0.04, 0.2), blueFighterLeg = c(0.04, 0.08, 0, 0.21, 0.06,
0.07), blueFighterDistance = c(0.91, 0.47, 1, 1, 0.66, 0.61
), blueFighterClinch = c(0.08, 0.1, 0, 0, 0.12, 0.11), blueFighterGround = c(0,
0.41, 0, 0, 0.22, 0.28), blueFighterResult = c("L", "L",
"L", "L", "L", "L")), row.names = c(NA, 6L), class = "data.frame")
I want to check if redFighterHead, redFighterBody etc (which all contain percentage data) have valid percentages. That is, no occurrences of negative numbers or numbers greater than 1.
Can anyone think of a way to do this?
CodePudding user response:
Update after TO provided reprex:
I would do sth. like this:
library(tidyverse)
df %>%
select(where(is.numeric)) %>%
summarize(across(everything(), ~all(. >= 0 & . <= 1)))
This gives you the info which column satisfy your condition and which don't. Also note that I used a condition >= 0 and <=1, instead of >0 and <1, because 0 and 1 are valid percentages!
Another note: I only checked numeric columns on your condition and left our character columns.
redFighterHead redFighterBody redFighterLeg redFighterDistance redFighterClinch redFighterGround blueFighterHead blueFighterBody blueFighterLeg blueFighterDistance
1 TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
blueFighterClinch blueFighterGround
1 TRUE TRUE
If you just want to print the general info about your data, you could wrap this code into your if condition and using all
:
if (all(df %>%
select(where(is.numeric)) %>%
summarize(across(everything(), ~all(. >= 0 & . <= 1))) == TRUE)) {
print("df has no values less than 0 or greater than 1")
} else {
print ("df only has values between 0 and 1")
}
Additional note: your current print statements are basically the same. The first statement says that all elements are between 0 and 1, and the second statement has exactly the same.
OLD:
Assuming your data is called df and your column Percentages you can do:
if (all(df$Percentages > 0 & df$Percentages < 1)) {
print("df has no values less than 0 or greater than 1")
} else {
print ("df only has values between 0 and 1")
}