Suppose we have a dataframe df
with a structure like a the following:
df <- read.table(header = TRUE, text = "
subj age trial
A 9 1
A 9 2
A 9 3
B 10 1
B 10 2
C 8 1
C 8 2
C 8 3
C 8 4")
How to elegantly remove the mean age of all subjects from each subject so that the resulting dataframe would look like below?
subj age trial
A 0 1
A 0 2
A 0 3
B 1 1
B 1 2
C -1 1
C -1 2
C -1 3
C -1 4
P.S. - Notice that each subject (first column) has multiple rows, and thus the associated age (second column) is repeated. The number of repetitions depends on the number of trials (third column) and is different across subjects. Thus, the average average across all subjects should be computed without those duplicates.
CodePudding user response:
Use the mutate
function as below. However, note that the mean of your age variable is 8.777778
. So you won't get exactly output as you expect.
library(tidyverse)
df = read.table(
header = TRUE,text="
subj age trial
A 9 1
A 9 2
A 9 3
B 10 1
B 10 2
C 8 1
C 8 2
C 8 3
C 8 4
") %>% as_tibble()
df %>% mutate(age = age - mean(age))
output
# A tibble: 9 x 3
subj age trial
<chr> <dbl> <int>
1 A 0.222 1
2 A 0.222 2
3 A 0.222 3
4 B 1.22 1
5 B 1.22 2
6 C -0.778 1
7 C -0.778 2
8 C -0.778 3
9 C -0.778 4
Even if you use rounding
df %>% mutate(age = age - round(mean(age)))
output
A tibble: 9 x 3
subj age trial
<chr> <dbl> <int>
1 A 0 1
2 A 0 2
3 A 0 3
4 B 1 1
5 B 1 2
6 C -1 1
7 C -1 2
8 C -1 3
9 C -1 4
If you absolutely must get the result as shown in the question, you must subtract 1 from the results <0.
df %>% mutate(age = age - round(mean(age)),
age = ifelse(age<0, age-1, age))
output
# A tibble: 9 x 3
subj age trial
<chr> <dbl> <int>
1 A 0 1
2 A 0 2
3 A 0 3
4 B 1 1
5 B 1 2
6 C -2 1
7 C -2 2
8 C -2 3
9 C -2 4
Update 1
If you don't want to use information from the trial
variable, you can do something like this:
mage = df %>%
distinct(subj, age) %>%
pull(age) %>% mean()
df %>% mutate(age = age - mage)
or like this:
mage = df %>%
group_by(subj) %>%
summarise(age = min(age)) %>%
pull(age) %>% mean()
df %>% mutate(age = age - mage)
output
# A tibble: 9 x 3
subj age trial
<chr> <dbl> <int>
1 A 0 1
2 A 0 2
3 A 0 3
4 B 1 1
5 B 1 2
6 C -1 1
7 C -1 2
8 C -1 3
9 C -1 4
CodePudding user response:
library(dplyr)
df %>%
mutate(mean_age = mean(age[trial == 1], na.rm = TRUE)) %>%
mutate(age = age - mean_age)
# A tibble: 9 × 4
subj age trial mean_age
<chr> <dbl> <int> <dbl>
1 A 0 1 9
2 A 0 2 9
3 A 0 3 9
4 B 1 1 9
5 B 1 2 9
6 C -1 1 9
7 C -1 2 9
8 C -1 3 9
9 C -1 4 9
EDIT:
With a random trial
per subj
:
df %>%
group_by(subj) %>%
mutate(trial_sample = sample(trial, 1)) %>%
ungroup() %>%
mutate(mean_age = mean(age[trial_sample], na.rm = TRUE)) %>%
mutate(age = age - mean_age)
Data (thanks goes to @Marek Fiolka):
df = read.table(
header = TRUE,text="
subj age trial
A 9 1
A 9 2
A 9 3
B 10 1
B 10 2
C 8 1
C 8 2
C 8 3
C 8 4
") %>% as_tibble()