Home > Software engineering >  for-while ifelse loop? (R-Programming)
for-while ifelse loop? (R-Programming)

Time:01-05

To be honest, I am completely stuck, I'm not quite sure how to phrase the title either. I have two datasets, lets say it looks something like this:

Dataset1 (ie GDP related):

Year Country
2000 Austria
2001 Austria
2000 Belgium
2001 Belgium

Dataset2 (TAX-related):

Year Austria Belgium
2000 55 48
2001 51 45

So what I would like, is to generate some sort of function/loop that essentially says:

  • if our country variable in dataset1 has a name that is a column name in dataset2, use these observations

  • Then, conditional on the year and country, I want to create a new variable in dataset1 called tax, apply the country's tax rate from dataset two into dataset1.

    So for instance, we know Austria (observation) is also a name of a variable, then I want to get this tax rate from dataset2, and apply 55 for year 2000 and 56 for 2001, for dataset1. And this will go on for all countries and years.

And should thus look like Dataset1 (ie GDP related):

Year Country Tax
2000 Austria 55
2001 Austria 51
2000 Belgium 48
2001 Belgium 45

My dataset is quite big, so it is much preferred if I have some sort of algorithm for this

Thanks!

CodePudding user response:

Assuming the first data have more columns, then after reshaping the second data to long with pivot_longer, do a join with the first data (left_join) which matches the 'Year', 'Country'

library(dplyr)
library(tidyr)
 df2 %>% 
   pivot_longer(cols = -Year, names_to = 'Country', values_to = 'Tax') %>%
   left_join(df1, .)

-output

  Year Country Tax
1 2000 Austria  55
2 2001 Austria  51
3 2000 Belgium  48
4 2001 Belgium  45

data

df1 <- structure(list(Year = c(2000L, 2001L, 2000L, 2001L), Country = c("Austria", 
"Austria", "Belgium", "Belgium")), class = "data.frame", row.names = c(NA, 
-4L))

df2 <- structure(list(Year = 2000:2001, Austria = c(55L, 51L), Belgium = c(48L, 
45L)), class = "data.frame", row.names = c(NA, -2L))

CodePudding user response:

This should also work:

library(dplyr)
library(tidyr)

df2 %>% 
  # pivot_longer(-Year) %>% first solution
  pivot_longer(cols = -Year, names_to = 'Country', values_to = 'Tax') %>% # taken from @akrun
  arrange(Country)
   Year Country   Tax
  <int> <chr>   <int>
1  2000 Austria    55
2  2001 Austria    51
3  2000 Belgium    48
4  2001 Belgium    45
  •  Tags:  
  • r
  • Related