Suppose I have a dataset with people born in different years:
ID year birth_year outcome
1 10021 2015 1960 1
2 10021 2016 1960 1
3 10021 2017 1960 1
4 10021 2018 1960 0
5 10021 2019 1960 0
6 10022 2015 1968 1
7 10022 2016 1968 0
8 10022 2017 1968 0
9 10022 2018 1968 0
10 10022 2019 1968 0
11 10023 2015 1968 1
12 10023 2016 1968 1
13 10023 2017 1968 1
14 10023 2018 1968 1
15 10023 2019 1968 1
16 10024 2015 1961 0
17 10024 2016 1961 0
18 10024 2017 1961 0
19 10024 2018 1961 1
20 10024 2019 1961 1
I want to split this dataset into smaller datasets according to birth year, and store them as year1960
, year1961
and year1968
. Specifically,
> year1960
ID year birth_year outcome
1 10021 2015 1960 1
2 10021 2016 1960 1
3 10021 2017 1960 1
4 10021 2018 1960 0
5 10021 2019 1960 0
> year1961
1 10024 2015 1961 0
2 10024 2016 1961 0
3 10024 2017 1961 0
4 10024 2018 1961 1
5 10024 2019 1961 1
> year1968
1 10022 2015 1968 1
2 10022 2016 1968 0
3 10022 2017 1968 0
4 10022 2018 1968 0
5 10022 2019 1968 0
6 10023 2015 1968 1
7 10023 2016 1968 1
8 10023 2017 1968 1
9 10023 2018 1968 1
10 10023 2019 1968 1
How do I do this with fewest steps possible?
CodePudding user response:
There are probably shorter/better ways to do this but his will work and you'll end up with individual dataframes for each birth year.
# read data
df <-read.csv('data.csv')
# split data by 'birth_year' into list of data frames
df_split <- split(df, with(df, birth_year))
# rename elements of list
names(df_split) <- paste0('year', names(df_split))
# create individual dataframes from list
list2env(df_split, env = .GlobalEnv)