I'm not really sure how to make matrixes as I'm new so hopefully this makes sense
Gender | Height |
---|---|
F | 160 |
M | 182 |
M | 175 |
F | 157 |
F | 172 |
M | 195 |
How could I get a vector of height based on unsorted gender? So I want a male vector that contains the values 182, 175, 195 and a female vector that contains the values 160, 157, 172. I have 1000 rows so I'm not sure how I can make it easier. Thanks!
CodePudding user response:
The height can be split by gender with the split
command:
df = data.frame(gender=c('F','M','M','F','F','M'),
height=c(160,182,175,157,172,195))
split(x = df$height, f = df$gender)
# $F
# [1] 160 157 172
# $M
# [1] 182 175 195
CodePudding user response:
Welcome to Stack Overflow!!
It's a community norm to try to create code to produce your data. However, the table you gave suffices to understand your question. The reproducible example is below, for your reference.
df = data.frame(gender=c('F','M','M','F','F','M'),
height=c(160,182,175,157,172,195))
Now, there are LOTS of ways to approach your question , and the best answer will be influenced by how you wish to use the answer.
> df$height[gender=='F']
[1] 160 157 172
> df$height[gender=='M']
[1] 182 175 195
>
If you don't know how many levels of your factor variable (in this case gender), you might run into, the code below creates one list for each
> tapply(height,gender,list) # list each height by gender
$F
[1] 160 157 172
$M
[1] 182 175 195
Of course, it does depend on what you want to do with these vectors once you've collected them. Vectors by themselves aren't necessarily all that easy to interpret, once you've got thousands of rows.
Are you planning to plot? Summarize?
For summarizing, use the features of modern R packages such as dplyr, but for now, I'll stick with the base R example.
To apply a function, you could do
> tapply(height,gender,max)
F M
172 195
Here's what a tidyverse solution might look like
require(magrittr)
require(dplyr)
require(ggplot2)
# summarize
df %>% group_by(gender) %>% summarize(x=mean(height))
# plot a histogram (will look silly here but not with 1000s of rows
ggplot(df,aes(x=height)) geom_histogram(binwidth=20) facet_grid(rows=vars(gender))
This is just meant to be a sampler. I tried to give examples of base R (tapply) and more modern programming approaches.