Home > Software engineering >  Manipulate the first row for an individual in R
Manipulate the first row for an individual in R

Time:11-25

Suppose an individual has several entries (rows) in a data frame. For example;

rm(list=ls()); set.seed(1234); n<-3 ; 
individualID<-rep(1:3, rep(3,3) )
X<- runif(n*3, 1, 4)
Y<- rep( runif(n,1,4), rep(3,3)  )
df1<-round(data.frame(individualID,X,Y),3)
df1
 individualID     X     Y
1            1 3.512 1.656
2            1 1.859 1.656
3            1 1.800 1.656
4            2 1.560 3.432
5            2 1.697 3.432
6            2 1.950 3.432
7            3 1.908 2.577
8            3 1.477 2.577
9            3 1.120 2.577

I would like to manipulate only the first row for every individual as follows such that df1$X will be equal to df1$Y but the rest of the rows remain.

I should end up with

 individualID     X     Y
1            1 1.656 1.656
2            1 1.859 1.656
3            1 1.800 1.656
4            2 3.432 3.432
5            2 1.697 3.432
6            2 1.950 3.432
7            3 2.577 2.577
8            3 1.477 2.577
9            3 1.120 2.577

CodePudding user response:

You may try

library(dplyr)

df1 %>%
  group_by(individualID) %>%
  mutate(n = 1:n()) %>%
  mutate(X = ifelse(n == 1, Y, X)) %>%
  select(-n)

  individualID     X     Y
         <int> <dbl> <dbl>
1            1  1.66  1.66
2            1  1.86  1.66
3            1  1.8   1.66
4            2  3.43  3.43
5            2  1.70  3.43
6            2  1.95  3.43
7            3  2.58  2.58
8            3  1.48  2.58
9            3  1.12  2.58

CodePudding user response:

You can use row_number() to number rows by group.

library(dplyr)

df1 %>% 
  group_by(individualID) %>% 
  mutate(X = ifelse(row_number() == 1, Y, X)) %>% 
  ungroup()

I get different values to your df1 for some reason, but the result is:

# A tibble: 9 × 3
  individualID     X     Y
         <dbl> <dbl> <dbl>
1            1  2.54  2.54
2            1  2.87  2.54
3            1  2.83  2.54
4            2  3.08  3.08
5            2  3.58  3.08
6            2  2.92  3.08
7            3  2.64  2.64
8            3  1.70  2.64
9            3  3.00  2.64

CodePudding user response:

Another approach would be -

library(dplyr)

df1 %>%
  group_by(individualID) %>%
  mutate(X = c(Y[1], X[-1])) %>%
  ungroup

#  individualID     X     Y
#         <int> <dbl> <dbl>
#1            1  1.66  1.66
#2            1  1.86  1.66
#3            1  1.8   1.66
#4            2  3.43  3.43
#5            2  1.70  3.43
#6            2  1.95  3.43
#7            3  2.58  2.58
#8            3  1.48  2.58
#9            3  1.12  2.58
  • Related