Home > other >  How to calculate mean of a certain number of rows in Column B, given they equal a certain value in c
How to calculate mean of a certain number of rows in Column B, given they equal a certain value in c

Time:01-17

I have 569 rows of data related to breast cancer. In column A, each row either has a value of 'M' or 'B' in the cell (malignant or benign). In column B, the concavity of the nucleus of each tumour is given. I want to find the mean concavity for all malignant tumours, and for all benign tumours, separately.

Edit: first 25 rows of columns A and B given below as an example

> df2
    data2.diagnosis data2.concavity_mean
1                 M            0.3001000
2                 M            0.0869000
3                 M            0.1974000
4                 M            0.2414000
5                 M            0.1980000
6                 M            0.1578000
7                 M            0.1127000
8                 M            0.0936600
9                 M            0.1859000
10                M            0.2273000
11                M            0.0329900
12                M            0.0995400
13                M            0.2065000
14                M            0.0993800
15                M            0.2128000
16                M            0.1639000
17                M            0.0739500
18                M            0.1722000
19                M            0.1479000
20                B            0.0666400
21                B            0.0456800
22                B            0.0295600
23                M            0.2077000
24                M            0.1097000
25                M            0.1525000

How do I ask R to give me "the mean of rows in column B, given their value in column A is M" and then "given their value in column A is B"?

CodePudding user response:

Assuming your variable A is a factor, a base R approach for the example dataframe example would be

example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))

mean(example$B[example$A == 'M'])
#> [1] 2

# for both factor levels simultaneously you can use 
by(example$B, example$A, mean)
#> example$A: B
#> [1] 3
# ---- #
#> example$A: M
#> [1] 2

Note. Created on 2022-01-16 by the reprex package (v2.0.1)

CodePudding user response:

Copying one of the examples of the above users (who have provided valid solutions), I am just providing a few alternative solutions using the tidyverse package

example <- data.frame(A = as.factor(c('M','B','M', 'B')), B=c(1,2,3,4))

#first example creates a new table with summarized values
example %>% #takes your data table
  group_by(A) %>% #groups it by the factors listed in column A
  summarize(mean_A=mean(B)) #finds the mean of each subgroup (from previous step)

If you found this or any of these answers as helpful, please select it as final answer.

CodePudding user response:

As pointed in the comments, it would be nice to have a reproducible example and your data (or at least a subset of them) to see what are you dealing with.

Anyway, the solution to your problem should resemble the following (I am using simulated data):

set.seed(1986)

dta = data.frame("type" = c(rep("B", length = 5), rep("M", length = 5)), "nucleus" = rnorm(10))

mean(dta$nucleus[dta$type == "B"]) # Mean concavity for benign.
mean(dta$nucleus[dta$type == "M"]) # Mean concavity for malign.

Basically, I am just applying the mean() function to two subsets of the data, by selecting rows with the [] operator.

  •  Tags:  
  • Related