Home > Software engineering >  How to subtract a set of values from each corresponding value in a larger dataframe
How to subtract a set of values from each corresponding value in a larger dataframe

Time:09-15

I have a dataframe (df1) laid out as such:

Condition, Time, Change,
Insulin, 0, 0.00,
Insulin, 5, 4.70,
Insulin, 10, 10.63,
Insulin, 0, 0,
Insulin, 5, 4.48,
Insulin, 10, 8.97,

(this repeats for around 400 rows)

and a dataframe (df2) containing averages for each time point to act as a baseline:

Condition, Time, Change,
Basal, 0, 0,
Basal, 5, 0.81,
Basal, 10, 1.03,

(this does not repeat)

I am trying to subtract the average Change value of each time point in df2 from each value from the corresponding Change value of each time point in df1, to try and get something like this:

Condition, Time, Change, dif
Insulin, 0, 0.00, 0,
Insulin, 5, 4.70, 3.89,
Insulin, 10, 10.63, 9.6,
Insulin, 0, 0, 0,
Insulin, 5, 4.48, 3.67,
Insulin, 10, 8.97, 7.94,

I have tried combining these dataframes so that df2 is present at the end of df1 and using the diff function to find the difference as shown in this thread:

df3 <- rbind(df1, df3)
df3$dif = ave(df3$Change, df3[,"Time"], FUN=diff)

But this subtracts each value from the following value for each time point and I get something like this:

Condition, Time, Change, dif
Insulin, 0, 0.00, 0,
Insulin, 5, 4.70, -0.22,
Insulin, 10, 10.63, -1.66,
Insulin, 0, 0, 0,
Insulin, 5, 4.48, -3.67,
Insulin, 10, 8.97, -7.94,

It appears that diff is not the right function to be using here but I am stuck with what to try next - could anyone advise?

CodePudding user response:

If I understand the problem correctly, should there also be an ID variable in this data frame? To use your example, if you have repeated measures on two participants, you can accomplish your goal using the code below:

df1 <- tibble( ID = c( 1, 1, 1, 2, 2, 2 ), 
               Condition = c( "Insulin", "Insulin", "Insulin", "Insulin", "Insulin", "Insulin" ),
               Time = c( 0, 5, 10, 0, 5, 10 ),
               Change = c( 0.00, 4.70, 10.63, 0, 4.48, 8.97 ) )

df2 <- tibble( Condition = c( "Basal", "Basal", "Basal" ),
               Time = c( 0, 5, 10 ),
               Change = c( 0, 0.81, 1.03 ) )

df1 <- df1 %>% group_by( ID ) %>% mutate( dif = Change - df2$Change )

df1
# A tibble: 6 × 5
# Groups:   ID [2]
     ID Condition  Time Change   dif
  <dbl> <chr>     <dbl>  <dbl> <dbl>
1     1 Insulin       0   0     0   
2     1 Insulin       5   4.7   3.89
3     1 Insulin      10  10.6   9.6 
4     2 Insulin       0   0     0   
5     2 Insulin       5   4.48  3.67
6     2 Insulin      10   8.97  7.94

CodePudding user response:

You should provide code to reproduce your data frames. Nonetheless, I've tried to do it on my own.

df_1 <- data.frame(condition="insulin",time=c(0,5,10,0,5,10),change=c(0,4.7,10.63,0,4.48,8.97))

df_2 <- data.frame(condition="basal",time=c(0,5,10),change=c(0,.81,1.03))

You can use merge and use only the time column for merger. This way the second change value will appear in a column and you can subtract the 2 columns

The following code will do this job:

df_3 <- merge(df_1,df_2,by="time")

df_3$diff <- df_3$change.x - df_3$change.y

  time condition.x change.x condition.y change.y diff
1    0     insulin     0.00       basal     0.00 0.00
2    0     insulin     0.00       basal     0.00 0.00
3    5     insulin     4.70       basal     0.81 3.89
4    5     insulin     4.48       basal     0.81 3.67
5   10     insulin    10.63       basal     1.03 9.60
6   10     insulin     8.97       basal     1.03 7.94
  •  Tags:  
  • r
  • Related