How to sum values in multiple rows to a new column in R?-CodePudding

My dataframe:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

  Observation Topic Gamma
1       Apple     1   0.1
2   Blueberry     2   0.1
3      Cirtus     3   0.2
4       Dates     4   0.2
5    Eggplant     5   0.1

How can I tell R to add the values of 1, 3, and 5, and 2 and 4, and then save it in a new column? For example:

Observation	Topic	Gamma	new variable
Apple	1	.10	.40
Blueberry	2	.10	.30
Cirtus	3	.20	.40
Dates	4	.20	.30
Eggplant	5	.10	.40

Essentially, I'd like each observation to have a new value that sums up the gamma scores of topics 1, 3, and 5, as well as topics 2 and 4.

Update: Clarification: I am not trying to add even topic numbers or odd topic numbers. Sometimes it will be a mixture of both. See this new table as an example:

Observation	Topic	Gamma	new variable
Apple	1	.10	.10
Blueberry	2	.10	.70
Cirtus	3	.20	.40
Dates	4	.20	.40
Eggplant	5	.10	.70
Fruits	6	.50	.70

In this example, I left topic 1 alone, added topics 2, 5, and 6, and added topics 3 and 4.

Update: Clarification:

Observation	Topic	Gamma	new variable
Apple	1	.10	.10
Apple	2	.10	.70
Apple	3	.20	.40
Apple	4	.20	.40
Apple	5	.10	.70
Apple	6	.50	.70
Blueberry	1	.20	.20
Blueberry	2	.10	.60
Blueberry	3	.30	.80
Blueberry	4	.50	.80
Blueberry	5	.40	.60
Blueberry	6	.10	.60

In this example, Each fruit (observation) has their own set of values for each topic and I summed the same topics as listed above (2, 5, and 6, 3 and 4) per fruit.

CodePudding user response：

Update II on new request:

library(dplyr)

df %>% 
  group_by(Observation, grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

  Observation Topic Gamma new_variable
   <chr>       <int> <dbl>        <dbl>
 1 Apple           1   0.1          0.1
 2 Apple           2   0.1          0.7
 3 Apple           3   0.2          0.4
 4 Apple           4   0.2          0.4
 5 Apple           5   0.1          0.7
 6 Apple           6   0.5          0.7
 7 Blueberry       1   0.2          0.2
 8 Blueberry       2   0.1          0.6
 9 Blueberry       3   0.3          0.8
10 Blueberry       4   0.5          0.8
11 Blueberry       5   0.4          0.6
12 Blueberry       6   0.1          0.6

Update: on new request of OP. This solution is inspired fully by PaulS solution (credits to him):

library(dplyr)

df %>% 
  group_by(grp = case_when(Topic %in% 1 ~ 1,
                           Topic %in% c(2,5,6) ~ 2,
                           Topic %in% c(3,4) ~ 3)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

  Observation Topic Gamma new_variable
  <chr>       <int> <dbl>        <dbl>
1 Apple           1   0.1          0.1
2 Blueberry       2   0.1          0.7
3 Cirtus          3   0.2          0.4
4 Dates           4   0.2          0.4
5 Eggplant        5   0.1          0.7
6 Fruits          6   0.5          0.7

First answer: We could sum Gamma after identifying odd and even rows in an ifelse statement: In this case row_number could be replaced by Topic

library(dplyr)

df %>% 
  mutate(new_variable = ifelse(row_number() %% 2 == 1, 
                               sum(Gamma[row_number() %% 2 == 1]), # odd 1,3,5
                               sum(Gamma[row_number() %% 2 == 0])) # even 2,4
         )

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

data:

structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
"Dates", "Eggplant"), Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 
0.1)), class = "data.frame", row.names = c(NA, -5L))

Microbenchmark: AndrewGB's base R is fastest

CodePudding user response：

This should do it.

dat <- structure(list(Observation = c("Apple", "Blueberry", "Cirtus", 
                                 "Dates", "Eggplant"), 
                 Topic = 1:5, Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1)), 
            row.names = c(NA, 5L), class = "data.frame")
library(tidyverse)
dat %>% 
  mutate(even = as.numeric(Topic %% 2 == 0)) %>% 
  group_by(even) %>% 
  mutate(new_variable = sum(Gamma))
#> # A tibble: 5 × 5
#> # Groups:   even [2]
#>   Observation Topic Gamma  even new_variable
#>   <chr>       <int> <dbl> <dbl>        <dbl>
#> 1 Apple           1   0.1     0          0.4
#> 2 Blueberry       2   0.1     1          0.3
#> 3 Cirtus          3   0.2     0          0.4
#> 4 Dates           4   0.2     1          0.3
#> 5 Eggplant        5   0.1     0          0.4

^{Created on 2022-05-13 by the reprex package (v2.0.1)}

CodePudding user response：

Another possible solution:

library(dplyr)

df %>% 
  group_by(grp = if_else(Topic %in% c(1, 3, 5), 1, 2)) %>% 
  mutate(new_variable = sum(Gamma)) %>% 
  ungroup %>% 
  select(-grp)

#> # A tibble: 5 × 4
#>   Observation Topic Gamma new_variable
#>   <chr>       <int> <dbl>        <dbl>
#> 1 Apple           1   0.1          0.4
#> 2 Blueberry       2   0.1          0.3
#> 3 Cirtus          3   0.2          0.4
#> 4 Dates           4   0.2          0.3
#> 5 Eggplant        5   0.1          0.4

CodePudding user response：

Update II (but will work with the first update as well)

With base R, we can first create a new grouping column, where we copy the Topic column as factor, then we can change the levels according to what rows you want to group together to sum. Then, we can get the sum of the Gamma column by the Topic and row groups. Then, remove the grp column.

df$grp <- factor(df$Topic)

levels(df$grp) <- list(
  "1" = 1,
  "2" = c(2,5,6),
  "3" = c(3,4)
)

df$new_variable <- ave(df$Gamma, df[,c(1,4)], FUN = sum)

df <- df[,-4]

Output

   Observation Topic Gamma new_variable
1        Apple     1   0.1          0.1
2        Apple     2   0.1          0.7
3        Apple     3   0.2          0.4
4        Apple     4   0.2          0.4
5        Apple     5   0.1          0.7
6        Apple     6   0.5          0.7
7    Blueberry     1   0.2          0.2
8    Blueberry     2   0.1          0.6
9    Blueberry     3   0.3          0.8
10   Blueberry     4   0.5          0.8
11   Blueberry     5   0.4          0.6
12   Blueberry     6   0.1          0.6

Data

df <- structure(list(Observation = c("Apple", "Apple", "Apple", "Apple", 
"Apple", "Apple", "Blueberry", "Blueberry", "Blueberry", "Blueberry", 
"Blueberry", "Blueberry"), Topic = c(1L, 2L, 3L, 4L, 5L, 6L, 
1L, 2L, 3L, 4L, 5L, 6L), Gamma = c(0.1, 0.1, 0.2, 0.2, 0.1, 0.5, 
0.2, 0.1, 0.3, 0.5, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
-12L))

First Answer

With base R, we can use ave to get the sum for each group. Here, I create the group using a logical since we only have 2 groups.

df$new_variable <- ave(df$Gamma, row.names(df) %in% c(1, 3, 5), FUN=sum)

Output

  Observation Topic Gamma new_variable
1       Apple     1   0.1          0.4
2   Blueberry     2   0.1          0.3
3      Cirtus     3   0.2          0.4
4       Dates     4   0.2          0.3
5    Eggplant     5   0.1          0.4

Or we could get the sum for each grouping of rows and assign to a new column by index.

df$new_variable[c(1, 3, 5)] <- sum(df$Gamma[c(1, 3, 5)], na.rm = T)
df$new_variable[c(2, 4)] <- sum(df$Gamma[c(2, 4)], na.rm = T)