I'm importing and tidying a large number of data sets and trying to streamline my code.
While the data sets are similar, more recent ones have different column headings.
For example:
df1 <- data.frame(a = c(1, 1, 3), b = 4:6, c = 5:7)
df2 <- data.frame(a = c(1, 2, 1, 1), b1 = 1:4, c = 8:11)
I have created a function to tidy this data, ensuring that, where it is present, b1 is renamed to b, and filtering a = 1 in all cases, as follows (this is a simplified version of the various bits of wrangling).
fun1 <- function(x) {
if(any(grep('b1', colnames(x)) > 0)) {
x %>%
filter(a == "1") %>%
rename(b = b1)
} else {
x %>%
filter(a == 1)
}
}
fun1(df1)
However this still requires that filter(a == 1)
step to be repeated within the function for both if()
and else()
I would therefore like to filter(a == 1)
all datasets, and then apply the if()
, else()
, rename()
stage, e.g.:
fun1 <- function(x) {
x %>%
filter(a == "1") %<%
if(any(grep('b1', colnames(x)) > 0)) {
rename(x, b = b1)
} else {
}
}
fun1(df1)
However, this returns the following error:
Error in if (.) any(grep("b1", colnames(x)) > 0) else { :
argument is not interpretable as logical
In addition: Warning message:
In if (.) any(grep("b1", colnames(x)) > 0) else { :
the condition has length > 1 and only the first element will be used
What am I getting wrong?
Thanks Jack
CodePudding user response:
You could also chain the if else
part as shown below:
fun1 <- function(x) {
x %>%
{if('b1'%in% names(.)) rename(., b = b1) else .} %>%
filter(a == 1)
}
fun1(df2)
a b c
1 1 1 8
2 1 3 10
3 1 4 11
fun1(df1)
a b c
1 1 4 5
2 1 5 6
CodePudding user response:
We can use dplyr
with rename_with
and filter
, so we get a function that is more versatile, and will work with unexpected names in any of the variables.
First rename with a function that extracts only the letters from the names, then filter the a==1, all within a single pipe chain.
library(dplyr)
my_wrangler<-function(df){
df %>% rename_with(.cols=everything(), ~str_extract(.x, "[[:alpha:]]")) %>%
filter(a==1)
}
> my_wrangler(df2)
a b c
1 1 1 8
2 1 3 10
3 1 4 11
CodePudding user response:
Updated Solution Here is a far more concise way of doing it suggested by dear Mr. Onyambu again:
df2 |>
subset(a == 1) |>
(\(.)setNames(., sub('^b1$', 'b', names(.))))()
a b c
1 1 1 8
3 1 3 10
4 1 4 11
We could also do it using base R pipe like this (inspired by Mr. Onyambu as always). I defined a custom function where you only need to change the data set at the start of the pipe line:
df1 |>
{\(x) {
tmp <- subset(x, a == 1)
if("b1" %in% colnames(tmp)) {
names(tmp)[names(tmp) == "b1"] <- "b"
}
tmp
}
}()
a b c
1 1 4 5
2 1 5 6
Or with df2
:
a b c
1 1 1 8
3 1 3 10
4 1 4 11
CodePudding user response:
fun1 <- function(x) {
store <- x %>%filter(a == "1")
if(any(grep('b1', colnames(store)) > 0)) {
rename(store, b = b1)
}
}
fun1(df2)
a b c
1 1 1 8
2 1 3 10
3 1 4 11