I've got a large file that looks like this:
SAMPLE1 10
SAMPLE1 10
SAMPLE1 10
SAMPLE1 2
SAMPLE2 10
SAMPLE2 10
SAMPLE2 2
SAMPLE2 2
the file is huge (several gigabytes) and R is killed when I want to read the file and then useboxplot
. So my idea is to use sort | uniq -c
on my file and to use a much smaller file that would now look like this ( with a 3rd column containing the number of observations):
SAMPLE1 10 3
SAMPLE1 2 1
SAMPLE2 10 2
SAMPLE2 2 2
Is there a way to use base:boxplot
to plot such data ?
CodePudding user response:
Here's a package ENmisc
with a function wtd.boxplot
.
https://www.rdocumentation.org/packages/ENmisc/versions/1.2-7/topics/wtd.boxplot
Alternatively, calculate the weighted quartiles and then draw the boxplot using those values.
CodePudding user response:
We can pre-compute 5 numbers per sample (min, low, mid, upper, max) in bash. Then data would be small enough to import to R, then we can boxplot using summary data: