I'm working on the dataframe of an inquiry that looks like:
User ID | 2012-01-01 | 2012-02-01 | 2012-02-01 |
---|---|---|---|
Cell 1 | NA | 2 | NA |
Cell 3 | 1 | NA | 5 |
I would like to find the date (the column name ) of the first non null column ( excluding the User ID column ) , the name of last non null column, and the duration between these to dates for each user ID.
Thank you !
I've tried :
df$min_date<-apply(df[-1], 1, function(x)
x[which.min(which(is.na(x) == FALSE))])
and
df$min_date<-apply(df[-1], 1, function(x)
colnames(x[min(which(is.na(x) == FALSE))]))
but it didn't work
CodePudding user response:
How about this:
library(dplyr)
library(tidyr)
d <- tibble::tribble(
~"User ID", ~"2012-01-01", ~"2012-02-01", ~"2012-02-01",
"Cell 1", NA, 2, NA,
"Cell 3", 1, NA, 5)
d %>%
pivot_longer(-1, names_to="date", values_to = "vals") %>%
na.omit() %>%
mutate(date = lubridate::ymd(date)) %>%
group_by(`User ID`) %>%
summarise(first = first(date),
last = last(date)) %>%
mutate(diff = last - first)
#> # A tibble: 2 × 4
#> `User ID` first last diff
#> <chr> <date> <date> <drtn>
#> 1 Cell 1 2012-02-01 2012-02-01 0 days
#> 2 Cell 3 2012-01-01 2012-02-01 31 days
Created on 2022-12-13 by the reprex package (v2.0.1)
And here's a base R way (though using lubridate) that is more in keeping with your original idea:
d <- tibble::tribble(
~"User ID", ~"2012-01-01", ~"2012-02-01", ~"2012-02-01",
"Cell 1", NA, 2, NA,
"Cell 3", 1, NA, 5)
d <- tibble::tribble(
~"User ID", ~"2012-01-01", ~"2012-02-01", ~"2012-02-01",
"Cell 1", NA, 2, NA,
"Cell 3", 1, NA, 5)
mind <- apply(d[,-1], 1, function(x)
colnames(d[,-1])[min(which(!is.na(x)))])
maxd <- apply(d[,-1], 1, function(x)
colnames(d[,-1])[max(which(!is.na(x)))])
d$min_date <- lubridate::ymd(mind)
d$max_date <- lubridate::ymd(maxd)
d$diff <- d$max_date - d$min_date
d
#> # A tibble: 2 × 7
#> `User ID` `2012-01-01` `2012-02-01` `2012-02-01` min_date max_date diff
#> <chr> <dbl> <dbl> <dbl> <date> <date> <drtn>
#> 1 Cell 1 NA 2 NA 2012-02-01 2012-02-01 0 days
#> 2 Cell 3 1 NA 5 2012-01-01 2012-02-01 31 days
Created on 2022-12-13 by the reprex package (v2.0.1)
CodePudding user response:
Here is a tidyverse
option:
Be careful you are using Non-syntactic names, moreover the 3rd and 4th column have the same name. This won't work in R:
library(dplyr)
library(tidyr)
df %>%
mutate(across(-c(User, ID), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
unite(non_null, starts_with('new'), na.rm = TRUE, sep = ' ') %>%
mutate(non_null = sub(" .*", "", non_null))
User ID X2012.01.01 X2012.02.01 X2012.02.01.1 non_null
1 Cell 1 NA 2 NA X2012.02.01
2 Cell 3 1 NA 5 X2012.01.01