Given a dataset like this with 200,000 rows:
Date Value1 Value2
1 2019-01-01 110 310
2 2019-01-02 70 200
3 2019-01-03 64 115
4 2019-01-04 50 80
5 2019-01-05 70 90
6 2019-01-06 30 49
...
How can I create a dataframe like this for each Type
No_of_rows Value1 Value2
1 10 (sum of first 10 values) (sum of first 10 values)
1 100 (sum of first 100 values) (sum of first 100 values)
then 1,000, 10,000 and so on
I did some finding and came up with this command as.numeric(apply(df[,1:2], 2, sum))
, which apparently finds the sum of two columns, but I don't know how to do it with for specific number of rows (10^n, where n=1,...,4) and store it for each column in new dataframe.
Any guidance please?
CodePudding user response:
Here's a solution based on the tidyverse.
library(tidyverse)
# For reproducibility
set.seed(12345)
# Define some test data
df <- tibble(
Value1=floor(rnorm(200000, 500, 100)),
Value2=floor(rnorm(200000, 500, 100))
)
lapply(
# For powers of 10
1:4,
# Apply this function
function(n) {
df %>%
# Take the first 10^n rows
head(10^n) %>%
# Calculate column means
summarise(
Value1=mean(Value1),
Value2=mean(Value2),
.groups="drop"
) %>%
# Add the index column
add_column(No_of_rows=10^n, .before=1)
}
) %>%
# Concatenate the results
bind_rows()
# A tibble: 4 × 3
No_of_rows Value1 Value2
<dbl> <dbl> <dbl>
1 10 486. 484.
2 100 524 497.
3 1000 504. 502.
4 10000 499. 500.