I am trying to insert a new row of the average value between the same month group and thus create a new item for that row. So for example the three rows:
3MTH_IMPVOL_90%MNY_DF: 23.9
3MTH_IMPVOL_100%MNY_DF: 18.0
3MTH_IMPVOL_110%MNY_DF: 15.0
Will then have 1 row in between each and produce a new entry:
- 3MTH_IMPVOL_90%MNY_DF: 23.9
- 3MTH_IMPVOL_95%MNY_DF: 20.95
- 3MTH_IMPVOL_100%MNY_DF: 18.0
- 3MTH_IMPVOL_105%MNY_DF: 16.5
- 3MTH_IMPVOL_110%MNY_DF: 15.0
Here is some code that I tried myself but didnt work, maybe it helps:
Issue here, want the entire dataset as is with extra rows, not deprecated values:
df %>% group_by(datestamp) %>%
mutate(id = rep(1:n(), length.out = n(), each = 4)) %>%
mutate(item
)%>%
summarize(item = unique(datestamp),
item = paste0(item[1], '_', item[2]),
value_mean = mean(value))
Found this, but doesn't work for me either:
do.call(rbind,
lapply(seq(1, nrow(df), 3), function(i){
x <- jse_vol[ i:(i 1), , drop = FALSE]
x <- x %>% group_by(item)
res <- rbind(x['value'], colSums(x['value'])/2)
rownames(res)[ nrow(res) ] <- paste(rownames(x), collapse = "_")
res
}))
Lastly:
df%>%
data.frame() %>%
rownames_to_column() %>%
mutate_if(is.factor, as.numeric) %>%
group_by(group = gl(n()/2, 2)) %>%
group_map(~ bind_rows(.x, tibble(rowname = paste(.x$rowname, collapse = "_"),
t1 = mean(.x$value),
)))
Dataframe:
datestamp entity short_name item value source
<date> <chr> <chr> <chr> <dbl> <chr>
1 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_90%MNY_DF 23.9 PSEC
2 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_100.0%MNY_DF 18.0 PSEC
3 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_110%MNY_DF 16.2 PSEC
4 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_90%MNY_DF 21.0 PSEC
5 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_100%MNY_DF 18 PSEC
6 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_110.0%MNY_DF 15.8 PSEC
7 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_90%MNY_DF 21.5 PSEC
8 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_100%MNY_DF 18.5 PSEC
9 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_110%MNY_DF 15.0 PSEC
10 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_90.0%MNY_DF 22.0 PSEC
11 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_100%MNY_DF 18.3 PSEC
12 2006-01-03 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_110%MNY_DF 16.7 PSEC
13 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_90%MNY_DF 23.9 PSEC
14 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_100.0%MNY_DF 18.0 PSEC
15 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_110%MNY_DF 16.2 PSEC
16 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_90%MNY_DF 21.0 PSEC
17 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_100%MNY_DF 18 PSEC
18 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_110.0%MNY_DF 15.8 PSEC
19 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_90%MNY_DF 21.5 PSEC
20 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_100%MNY_DF 18.5 PSEC
21 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 3MTH_IMPVOL_110%MNY_DF 15.0 PSEC
22 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 6MTH_IMPVOL_90.0%MNY_DF 22.0 PSEC
23 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 9MTH_IMPVOL_100%MNY_DF 18.3 PSEC
24 2006-01-04 JALSHTR Index FTSE/JSE Africa All Shares Index 12MTH_IMPVOL_110%MNY_DF 16.7 PSEC
CodePudding user response:
One solution might be something like the following,
df <- data.frame(Name = paste0("Row", 1:10),
Value = runif(10, 10, 100))
df2 <- matrix(nrow = 2*nrow(df) - 1,
ncol = ncol(df))
for (i in 1:nrow(df2)){
if (i %% 2 != 0){
df2[i,1] = df[(i 1)/2,1]
df2[i,2] = df[(i 1)/2,2]
}else
if (i %% 2 == 0){
df2[i,1] = paste0("Average of ",df[(i 1)/2,1], " and ", df[(i 2)/2,1])
df2[i,2] = (df[i/2,2] df[(i 2)/2,2])/2
}
}
CodePudding user response:
Your dataset doesn't make it easy to perform this calculation because you have two important variables (month and percent) buried within a string.
The first step should be to extract these (I've used str_extract() from stringr for 1st number and number before "%").
From there, there are more options. I pivotted the percentage column into rows then calculated the averages as new columns pivotted back to a long format:
library(tidyverse)
set.seed(1337) # I generate random numbers - this is for consistency
# Make an analogue of your dataframe
df <- crossing(
datestamp = as.Date(c("2006-01-03", "2006-01-04")),
mnth = c("3MTH", "6MTH", "9MTH", "12MTH"),
impvol = "_IMPVOL_",
pc = c("90%", "100%", "110%"),
str = "MNY_DF"
) %>%
transmute(datestamp, item = paste0(mnth, impvol, pc, str),
value = round(rnorm(24, 20, 2), 1))
new_rows <- df %>%
mutate(month = as.numeric(str_extract(item, "[0-9] ")),
pc = as.numeric(str_extract(item, "(\\d) (?=%)"))) %>%
arrange(datestamp, month, pc) %>%
select(-item) %>%
pivot_wider(
id_cols = c(datestamp, month),
names_from = pc,
values_from = value
) %>%
mutate(`95` = (`90` `100`) / 2,
`105` = (`100` `110`) / 2) %>%
pivot_longer(
cols = c(`90`, `95`, `100`, `105`, `110`),
names_to = "pc",
values_to = "value"
)