I have a dataframe (df1
) with three columns: Name
, Decade
and Count
. For example:
Name <- c("a","b","c")
Decade <- c(1810,1850,1900)
Count <- c(2,3,1)
df1 <- data.frame(Name,Decade,Count)
print(df1)
Name Decade Count
1 a 1810 2
2 b 1850 3
3 c 1900 1
I wish to create a new dataframe (df2
) with columns Name
and Decade
, with repeating rows of values in df1$Name
and df1$Decade
by df1$Count
. For each repeating row in df2, I wish the df2$Decade
to increase by an increment of 10. The first instance of each df2$Name
and df2$Decade
combination would be equal to df1$Name
and df1$Decade
.
In the above example, my desired output would be:
> print(df2)
Name Decade
1 a 1810
2 a 1820
3 b 1850
4 b 1860
5 b 1870
6 c 1900
I would love to show my workings so far, but I don't know how to start and have been manually doing it in excel. [hangs head in shame]
Thank you for your time in advance.
CodePudding user response:
Here is one option using a combination of tidyverse
and splitstackshape
. After duplicating rows according to Count
, we can use row_number
minus 1 to create a sequence for each group (e.g., 0, 1, 2...), then multiply by 10 to get the 10 year increment and add to the Decade
for that group.
library(tidyverse)
library(splitstackshape)
df2 <- expandRows(df1, "Count") %>%
group_by(Name) %>%
mutate(Decade = Decade (row_number()-1) *10)
Or if you don't want to load another package, we could create the duplicated rows with base R:
df2 <- df1[rep(seq(nrow(df1)), df1$Count),c(1:2)] %>%
group_by(Name) %>%
mutate(Decade = Decade (row_number()-1) *10)
Output
df2
Name Decade
<chr> <dbl>
1 a 1810
2 a 1820
3 b 1850
4 b 1860
5 b 1870
6 c 1900
CodePudding user response:
Sequences should sort this very quickly:
df2 <- df1[rep(seq_len(nrow(df1)), df1$Count), 1:2]
df2$Decade <- df2$Decade (sequence(df1$Count)-1) * 10
df2
# Name Decade
#1 a 1810
#1.1 a 1820
#2 b 1850
#2.1 b 1860
#2.2 b 1870
#3 c 1900