Home > OS >  Add a column that count number of rows until the first 1, by group in R
Add a column that count number of rows until the first 1, by group in R

Time:10-09

I have the following dataset:

test_df=data.frame(Group=c(1,1,1,1,2,2),var1=c(1,0,0,1,1,1),var2=c(0,0,1,1,0,0),var3=c(0,1,0,0,0,1))

Group var1 var2 var3
1 1 0 0
1 0 0 1
1 0 1 0
1 1 1 0
2 1 0 0
2 1 0 1

I want to add 3 columns (out1-3) for var1-3, which count number of rows until the first 1, by Group,

as shown below:

Group var1 var2 var3 out1 out2 out3
1 1 0 0 1 3 2
1 0 0 1 1 3 2
1 0 1 0 1 3 2
1 1 1 0 1 3 2
2 1 0 0 1 0 2
2 1 0 1 1 0 2

I used this R code, I repeated it for my 3 variables, and my actual dataset contains more than only 3 columns. But it is not working:

test_var1<-select(test_df,Group,var1 )%>% 
  group_by(Group) %>% 
  mutate(out1 = row_number()) %>% 
  filter(var1 != 0) %>% 
  slice(1)

CodePudding user response:

If you only have 3 "out" variables then you can create three rows as follows

#1- Your dataset
df=data.frame(Group=rep(1,4),var1=c(1,0,0,1),var2=c(0,0,1,1),var3=c(0,1,0,0))

#2- Count the first row number with "1" value
df$out1=min(rownames(df)[which(df$var1==1)])
df$out2=min(rownames(df)[which(df$var2==1)])
df$out3=min(rownames(df)[which(df$var3==1)])

If you have more than 3 columns, then it may be better to create a loop for example

for(i in 1:3){
    df[paste("out",i,sep="")]=min(rownames(df)[which(df[,which(colnames(df)==paste("var",i,sep=""))]==1)])
}

CodePudding user response:

df <- data.frame(Group=c(1,1,1,1,2,2),
                 var1=c(1,0,0,1,1,1),
                 var2=c(0,0,1,1,0,0),
                 var3=c(0,1,0,0,0,1))

This works for any number of variables as long as the structure is the same as in the example (i.e. Group many variables that are 0 or 1)

df %>% 
  mutate(rownr = row_number()) %>%
  pivot_longer(-c(Group, rownr)) %>%
  group_by(Group, name) %>%
  mutate(out = cumsum(value != 1 & (cumsum(value) < 1))   1,
         out = ifelse(max(out) > n(), 0, max(out))) %>% 
  pivot_wider(names_from = c(name, name), values_from = c(value, out)) %>% 
  select(-rownr)

Returns:

  Group value_var1 value_var2 value_var3 out_var1 out_var2 out_var3
  <dbl>      <dbl>      <dbl>      <dbl>    <dbl>    <dbl>    <dbl>
1     1          1          0          0        1        3        2
2     1          0          0          1        1        3        2
3     1          0          1          0        1        3        2
4     1          1          1          0        1        3        2
5     2          1          0          0        1        0        2
6     2          1          0          1        1        0        2
  • Related