Home > database >  How to find the column name of the first non nul value of a row
How to find the column name of the first non nul value of a row

Time:12-14

I'm working on the dataframe of an inquiry that looks like:

User ID 2012-01-01 2012-02-01 2012-02-01
Cell 1 NA 2 NA
Cell 3 1 NA 5

I would like to find the date (the column name ) of the first non null column ( excluding the User ID column ) , the name of last non null column, and the duration between these to dates for each user ID.

Thank you !

I've tried :

df$min_date<-apply(df[-1], 1, function(x) 
   x[which.min(which(is.na(x) == FALSE))])

and

df$min_date<-apply(df[-1], 1, function(x) 
   colnames(x[min(which(is.na(x) == FALSE))]))

but it didn't work

CodePudding user response:

How about this:

library(dplyr) 
library(tidyr)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d %>% 
  pivot_longer(-1, names_to="date", values_to = "vals") %>%
  na.omit() %>% 
  mutate(date = lubridate::ymd(date)) %>% 
  group_by(`User ID`) %>% 
  summarise(first = first(date), 
            last = last(date)) %>% 
  mutate(diff = last - first)
#> # A tibble: 2 × 4
#>   `User ID` first      last       diff   
#>   <chr>     <date>     <date>     <drtn> 
#> 1 Cell 1    2012-02-01 2012-02-01  0 days
#> 2 Cell 3    2012-01-01 2012-02-01 31 days

Created on 2022-12-13 by the reprex package (v2.0.1)

And here's a base R way (though using lubridate) that is more in keeping with your original idea:

d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
"Cell 1",   NA, 2,  NA,
"Cell 3",   1,  NA, 5)
d <- tibble::tribble(
  ~"User ID",   ~"2012-01-01",  ~"2012-02-01",  ~"2012-02-01", 
  "Cell 1", NA, 2,  NA,
  "Cell 3", 1,  NA, 5)

mind <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[min(which(!is.na(x)))])
maxd <- apply(d[,-1], 1, function(x)
  colnames(d[,-1])[max(which(!is.na(x)))])

d$min_date <- lubridate::ymd(mind)
d$max_date <- lubridate::ymd(maxd)
d$diff <- d$max_date - d$min_date

d
#> # A tibble: 2 × 7
#>   `User ID` `2012-01-01` `2012-02-01` `2012-02-01` min_date   max_date   diff   
#>   <chr>            <dbl>        <dbl>        <dbl> <date>     <date>     <drtn> 
#> 1 Cell 1              NA            2           NA 2012-02-01 2012-02-01  0 days
#> 2 Cell 3               1           NA            5 2012-01-01 2012-02-01 31 days

Created on 2022-12-13 by the reprex package (v2.0.1)

CodePudding user response:

Here is a tidyverse option:

Be careful you are using Non-syntactic names, moreover the 3rd and 4th column have the same name. This won't work in R:

library(dplyr)
library(tidyr)

df %>% 
  mutate(across(-c(User, ID), ~case_when(!is.na(.) ~ cur_column()), .names = 'new_{col}')) %>%
  unite(non_null, starts_with('new'), na.rm = TRUE, sep = ' ') %>% 
  mutate(non_null = sub(" .*", "", non_null))
  User ID X2012.01.01 X2012.02.01 X2012.02.01.1    non_null
1 Cell  1          NA           2            NA X2012.02.01
2 Cell  3           1          NA             5 X2012.01.01
  • Related