Imagine a dataset:
df1 <- tibble::tribble(~Client, ~Client_ID, ~Year, ~NPurchases, ~Age, ~Sex, ~Hobby,
"Alex Smith", 88888, 2022, NA, 30, "Male", "",
"Alex Smith", 88888, 2022, 45, 30, "Male", "Stamps",
"Alex Smith", 88888, 2021, 32, 29, "Male", "Stamps",
"Eugene Hugs", 788272, 2022, 2, 19, "Male", "Cats")
Or, the same dataset for visual representation:
The core columns are Client_ID and Year. For example, Alex Smith has two inputs for 2022. Could you, please, suggest nice methods to:
- Remove rows with more missing values
- Find duplicated rows with different missing data, look at them (to see if there is no error), and then remove the rows with fewer data.
Any nice suggestions will be much appreciated :)
For example, my instant idea to solve was:
unique(rbindlist(list(df1))[!is.na(NPurchases), .(Client, Client_ID, Year, Age, NPurchases, Sex, Hobby)])
But I am sure there should be more universal and tidy methods :)
DESIRED OUTPUTS:
- For the first variant:
- For the second variant: Similar solution that would provide the data on the rows it deleted.
There is a great solution by dplyr. Can you suggest other nice solutions?
CodePudding user response:
Please try filter_at
function from dplyr
library(dplyr)
df2 <- df1 %>% filter_at(vars(everything()), ~!(.x %in% c(NA, ' ')))
Created on 2023-01-21 with reprex v2.0.2
# A tibble: 3 × 7
Client Client_ID Year NPurchases Age Sex Hobby
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Alex Smith 88888 2022 45 30 Male Stamps
2 Alex Smith 88888 2021 32 29 Male Stamps
3 Eugene Hugs 788272 2022 2 19 Male Cats
CodePudding user response:
Alternatively, you may try the below code which is without the filter_at
df2 <- df1 %>% mutate(sum=rowSums(across(everything(), ~(.x %in% c(NA, ' '))))) %>%
filter(sum==0) %>% select(-sum)
Created on 2023-01-21 with reprex v2.0.2
# A tibble: 3 × 7
Client Client_ID Year NPurchases Age Sex Hobby
<chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 Alex Smith 88888 2022 45 30 Male Stamps
2 Alex Smith 88888 2021 32 29 Male Stamps
3 Eugene Hugs 788272 2022 2 19 Male Cats