I have a data frame. In that data frame I have two categorical variables. If I consider the cross product of those 2, then I have one row per element in my data frame.
The table shown below is a table of frequencies. Instead of showing the number of occurrences I would like to show the column test. Any ideas on how to do this? I have been trying to figure it out but I am not finding a way.
CodePudding user response:
If I understand your problem correctly, it seems like you want to pivot the data from long form to wide form. You can do this pretty easily with dplyr and the pivot_wider
function:
First we'll make some sample data (in the future, instead of including your data as an image, use the dput
function to output it in format we can copy and paste to reproduce the problem):
df = data.frame(data_set=c('a','a','a','b','b','b','c','c','c'),
category=rep.int(c('d','e','f'),3),
tests=abs(rnorm(9,sd=3)),
best_threads=rnorm(9, 50, 20))
data_set category tests best_threads
1 a d 3.1162129 15.61119
2 a e 4.0933109 19.71428
3 a f 0.1026443 44.63157
4 b d 3.3482561 47.68211
5 b e 5.9149545 69.19018
6 b f 4.2248788 52.54404
7 c d 3.4384232 41.86539
8 c e 4.2985273 76.49010
9 c f 0.2164352 44.36635
Then we just pivot it to wide form, using data_set
to be the new column names and tests
as the value of those cells. We need to drop best_threads
, either in the pivot_wider function, or beforehand:
library(dplyr)
pivot_wider(df, # Data frame to pivot
-best_threads, # Drop the best_threads variable
names_from = 'data_set', # The variable to use for column names
values_from = 'tests' # The variable to put in the cells
)
# A tibble: 3 × 4
category a b c
<chr> <dbl> <dbl> <dbl>
1 d 3.12 3.35 3.44
2 e 4.09 5.91 4.30
3 f 0.103 4.22 0.216