To be honest, I am completely stuck, I'm not quite sure how to phrase the title either. I have two datasets, lets say it looks something like this:
Dataset1 (ie GDP related):
Year | Country |
---|---|
2000 | Austria |
2001 | Austria |
2000 | Belgium |
2001 | Belgium |
Dataset2 (TAX-related):
Year | Austria | Belgium |
---|---|---|
2000 | 55 | 48 |
2001 | 51 | 45 |
So what I would like, is to generate some sort of function/loop that essentially says:
if our country variable in dataset1 has a name that is a column name in dataset2, use these observations
Then, conditional on the year and country, I want to create a new variable in dataset1 called tax, apply the country's tax rate from dataset two into dataset1.
So for instance, we know Austria (observation) is also a name of a variable, then I want to get this tax rate from dataset2, and apply 55 for year 2000 and 56 for 2001, for dataset1. And this will go on for all countries and years.
And should thus look like Dataset1 (ie GDP related):
Year | Country | Tax |
---|---|---|
2000 | Austria | 55 |
2001 | Austria | 51 |
2000 | Belgium | 48 |
2001 | Belgium | 45 |
My dataset is quite big, so it is much preferred if I have some sort of algorithm for this
Thanks!
CodePudding user response:
Assuming the first data have more columns, then after reshaping the second data to long with pivot_longer
, do a join with the first data (left_join
) which matches the 'Year', 'Country'
library(dplyr)
library(tidyr)
df2 %>%
pivot_longer(cols = -Year, names_to = 'Country', values_to = 'Tax') %>%
left_join(df1, .)
-output
Year Country Tax
1 2000 Austria 55
2 2001 Austria 51
3 2000 Belgium 48
4 2001 Belgium 45
data
df1 <- structure(list(Year = c(2000L, 2001L, 2000L, 2001L), Country = c("Austria",
"Austria", "Belgium", "Belgium")), class = "data.frame", row.names = c(NA,
-4L))
df2 <- structure(list(Year = 2000:2001, Austria = c(55L, 51L), Belgium = c(48L,
45L)), class = "data.frame", row.names = c(NA, -2L))
CodePudding user response:
This should also work:
library(dplyr)
library(tidyr)
df2 %>%
# pivot_longer(-Year) %>% first solution
pivot_longer(cols = -Year, names_to = 'Country', values_to = 'Tax') %>% # taken from @akrun
arrange(Country)
Year Country Tax
<int> <chr> <int>
1 2000 Austria 55
2 2001 Austria 51
3 2000 Belgium 48
4 2001 Belgium 45