I have 569 rows of data related to breast cancer. In column A, each row either has a value of 'M' or 'B' in the cell (malignant or benign). In column B, the concavity of the nucleus of each tumour is given. I want to find the mean concavity for all malignant tumours, and for all benign tumours, separately.
Edit: first 25 rows of columns A and B given below as an example
> df2
data2.diagnosis data2.concavity_mean
1 M 0.3001000
2 M 0.0869000
3 M 0.1974000
4 M 0.2414000
5 M 0.1980000
6 M 0.1578000
7 M 0.1127000
8 M 0.0936600
9 M 0.1859000
10 M 0.2273000
11 M 0.0329900
12 M 0.0995400
13 M 0.2065000
14 M 0.0993800
15 M 0.2128000
16 M 0.1639000
17 M 0.0739500
18 M 0.1722000
19 M 0.1479000
20 B 0.0666400
21 B 0.0456800
22 B 0.0295600
23 M 0.2077000
24 M 0.1097000
25 M 0.1525000
How do I ask R to give me "the mean of rows in column B, given their value in column A is M" and then "given their value in column A is B"?
CodePudding user response:
Assuming your variable A
is a factor, a base R approach for the example dataframe example
would be
example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))
mean(example$B[example$A == 'M'])
#> [1] 2
# for both factor levels simultaneously you can use
by(example$B, example$A, mean)
#> example$A: B
#> [1] 3
# ---- #
#> example$A: M
#> [1] 2
Note. Created on 2022-01-16 by the reprex package (v2.0.1)
CodePudding user response:
Copying one of the examples of the above users (who have provided valid solutions), I am just providing a few alternative solutions using the tidyverse
package
example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))
#first example creates a new table with summarized values
example %>% #takes your data table
group_by(A) %>% #groups it by the factors listed in column A
summarize(mean_A=mean(B)) #finds the mean of each subgroup (from previous step)
If you found this or any of these answers as helpful, please select it as final answer.
CodePudding user response:
As pointed in the comments, it would be nice to have a reproducible example and your data (or at least a subset of them) to see what are you dealing with.
Anyway, the solution to your problem should resemble the following (I am using simulated data):
set.seed(1986)
dta = data.frame("type" = c(rep("B", length = 5), rep("M", length = 5)), "nucleus" = rnorm(10))
mean(dta$nucleus[dta$type == "B"]) # Mean concavity for benign.
mean(dta$nucleus[dta$type == "M"]) # Mean concavity for malign.
Basically, I am just applying the mean()
function to two subsets of the data, by selecting rows with the []
operator.