I'm trying to create a new variable in R containing the initial values of another variable (crime) based on groups (countries) considering the initial period of time observable per group (on panel data framework), my current data looks like this:
country | year | Crime |
---|---|---|
Albania | 2016 | 2.7369478 |
Albania | 2017 | 2.0109779 |
Argentina | 2002 | 9.474084 |
Argentina | 2003 | 7.7898825 |
Argentina | 2004 | 6.0739941 |
And I want it to look like this:
country | year | Crime | Initial_Crime |
---|---|---|---|
Albania | 2016 | 2.7369478 | 2.7369478 |
Albania | 2017 | 2.0109779 | 2.7369478 |
Argentina | 2002 | 9.474084 | 9.474084 |
Argentina | 2003 | 7.7898825 | 9.474084 |
Argentina | 2004 | 6.0739941 | 9.474084 |
I saw that ddply could make it work this way, but the problem is that it is not longer supported by the latest R updates.
Thank you in advance.
CodePudding user response:
Maybe arrange
by year
, then after grouping by country
set Initial_Crime
to be the first
Crime
in the group.
library(tidyverse)
df %>%
arrange(year) %>%
group_by(country) %>%
mutate(Initial_Crime = first(Crime))
Output
country year Crime Initial_Crime
<chr> <int> <dbl> <dbl>
1 Argentina 2002 9.47 9.47
2 Argentina 2003 7.79 9.47
3 Argentina 2004 6.07 9.47
4 Albania 2016 2.74 2.74
5 Albania 2017 2.01 2.74
CodePudding user response:
library(data.table)
setDT(data)[, Initial_Crime:=.SD[1,Crime], by=country]
country year Crime Initial_Crime
1: Albania 2016 2.736948 2.736948
2: Albania 2017 2.010978 2.736948
3: Argentina 2002 9.474084 9.474084
4: Argentina 2003 7.789883 9.474084
5: Argentina 2004 6.073994 9.474084
CodePudding user response:
A data.table
solution
setDT(df)
df[, x := 1:.N, country
][x==1, initial_crime := crime
][, initial_crime := nafill(initial_crime, type = "locf")
][, x := NULL
]