How to calculate mean of a certain number of rows in Column B, given they equal a certain value in c-CodePudding

I have 569 rows of data related to breast cancer. In column A, each row either has a value of 'M' or 'B' in the cell (malignant or benign). In column B, the concavity of the nucleus of each tumour is given. I want to find the mean concavity for all malignant tumours, and for all benign tumours, separately.

Edit: first 25 rows of columns A and B given below as an example

> df2
    data2.diagnosis data2.concavity_mean
1                 M            0.3001000
2                 M            0.0869000
3                 M            0.1974000
4                 M            0.2414000
5                 M            0.1980000
6                 M            0.1578000
7                 M            0.1127000
8                 M            0.0936600
9                 M            0.1859000
10                M            0.2273000
11                M            0.0329900
12                M            0.0995400
13                M            0.2065000
14                M            0.0993800
15                M            0.2128000
16                M            0.1639000
17                M            0.0739500
18                M            0.1722000
19                M            0.1479000
20                B            0.0666400
21                B            0.0456800
22                B            0.0295600
23                M            0.2077000
24                M            0.1097000
25                M            0.1525000

How do I ask R to give me "the mean of rows in column B, given their value in column A is M" and then "given their value in column A is B"?

CodePudding user response：

Assuming your variable A is a factor, a base R approach for the example dataframe example would be

example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))

mean(example$B[example$A == 'M'])
#> [1] 2

# for both factor levels simultaneously you can use 
by(example$B, example$A, mean)
#> example$A: B
#> [1] 3
# ---- #
#> example$A: M
#> [1] 2

Note. Created on 2022-01-16 by the reprex package (v2.0.1)

CodePudding user response：

Copying one of the examples of the above users (who have provided valid solutions), I am just providing a few alternative solutions using the tidyverse package

example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))

#first example creates a new table with summarized values
example %>% #takes your data table
  group_by(A) %>% #groups it by the factors listed in column A
  summarize(mean_A=mean(B)) #finds the mean of each subgroup (from previous step)

If you found this or any of these answers as helpful, please select it as final answer.

CodePudding user response：

As pointed in the comments, it would be nice to have a reproducible example and your data (or at least a subset of them) to see what are you dealing with.

Anyway, the solution to your problem should resemble the following (I am using simulated data):

set.seed(1986)

dta = data.frame("type" = c(rep("B", length = 5), rep("M", length = 5)), "nucleus" = rnorm(10))

mean(dta$nucleus[dta$type == "B"]) # Mean concavity for benign.
mean(dta$nucleus[dta$type == "M"]) # Mean concavity for malign.

Basically, I am just applying the mean() function to two subsets of the data, by selecting rows with the [] operator.