Home > OS >  Construct a column getting elements from another table in R
Construct a column getting elements from another table in R

Time:07-22

Given a big table, in which there are two important columns:

  1. nation code
  2. time
Name     Nation code     Time       Other
a        IT              2010       x
b        DE              2011       y
c        ES              2009       z

I want to construct another column using those two indicators to take the data from another table, which is built like the following

Nation code     Y2009     Y2010      Y2011
DE              200       100        300
ES              100       150        350
GB              200       250        50
IT              50        150        100

The final result should be the following

Name     Nation code     Time       Other       NewColumn
a        IT              2010       x           150
b        DE              2011       y           300
c        ES              2009       z           100

CodePudding user response:

To get what you want, you'll need to perform two steps: First you'll want df2 in a long format (e.g. using pivot_longer), and then you'd do a left join with df1 (e.g. using left_join).

I.e.:

library(dplyr)
library(tidyr)

df2_long <-
  df2 |> 
  pivot_longer(-Nation_code, names_prefix = "Y", names_to = "Time", values_to = "new_col") |>
  mutate(Time = as.numeric(Time))

# # A tibble: 12 × 3
# Nation_code  Time new_col
# <chr>       <dbl>   <dbl>
#   1 DE           2009     200
# 2 DE           2010     100
# 3 DE           2011     300
# 4 ES           2009     100
# 5 ES           2010     150
# 6 ES           2011     350
# 7 GB           2009     200
# 8 GB           2010     250
# 9 GB           2011      50
# 10 IT           2009      50
# 11 IT           2010     150
# 12 IT           2011     100

df1 |>
  left_join(df2_long, by = c("Nation_code", "Time"))

Output:

# A tibble: 3 × 5
  Name  Nation_code  Time Other new_col
  <chr> <chr>       <dbl> <chr>   <dbl>
1 a     IT           2010 x         150
2 b     DE           2011 y         300
3 c     ES           2009 z         100

Data:

library(readr)

df1 <- read_table("Name     Nation_code     Time       Other
a        IT              2010       x
b        DE              2011       y
c        ES              2009       z")

df2 <- read_table("Nation_code     Y2009     Y2010      Y2011
DE              200       100        300
ES              100       150        350
GB              200       250        50
IT              50        150        100")

CodePudding user response:

base R (and reshape2)

reshape2::melt(df2, "Nation_code", variable.name = "Time", value.name = "NewColumn") |>
  transform(Time = as.integer(sub("\\D", "", Time))) |>
  merge(df1, by = c("Nation_code", "Time"), all.y = TRUE)
#   Nation_code Time NewColumn Name Other
# 1          DE 2011       300    b     y
# 2          ES 2009       100    c     z
# 3          IT 2010       150    a     x

data.table

# library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)
DT1[
  melt(DT2, "Nation_code", variable.name = "Time", value.name = "NewColumn"
    )[, Time := as.integer(sub("\\D", "", Time))],
  NewColumn := i.NewColumn, on = .(Nation_code, Time)][]
#      Name Nation_code  Time  Other NewColumn
#    <char>      <char> <int> <char>     <int>
# 1:      a          IT  2010      x       150
# 2:      b          DE  2011      y       300
# 3:      c          ES  2009      z       100

This is a combination of two frequent questions on SO:

CodePudding user response:

library(tidyverse)

df <- read_table("Name     Nation_code     Time       Other
a        IT              2010       x
b        DE              2011       y
c        ES              2009       z")

df1 <- read_table("Nation_code     Y2009     Y2010      Y2011
DE              200       100        300
ES              100       150        350
GB              200       250        50
IT              50        150        100")

df1 <- df1 %>%  
  pivot_longer(-Nation_code, 
               names_to = "Time", 
               values_to = "NewColumn") %>%  
  mutate(Time = Time %>% str_remove_all("Y") %>%  
           as.numeric)

df %>%  
  left_join(df1, by = c("Nation_code", "Time"))

# A tibble: 3 x 5
  Name  Nation_code  Time Other NewColumn
  <chr> <chr>       <dbl> <chr>     <dbl>
1 a     IT           2010 x           150
2 b     DE           2011 y           300
3 c     ES           2009 z           100

CodePudding user response:

Preparing the input tables:

a <- data.frame(
  "Name" = c("a","b","c"),
  "Nation_Code" = c("IT", "DE", "ES"),
  "Time" = c(2010, 2011, 2009),
  "Other" = c("x", "y", "z")
)

b <- data.frame(
  "Nation_Code" = c("DE", "ES", "GB", "IT"),
  "Y2009" = c(200,100,200,50),
  "Y2010" = c(100,150,250,150),
  "Y2011" = c(300,350,50,100)
)

This is how I would go about it: First, you need to convert the second table from the so-called "wide format" to the "long format". You can do this with gather(). This means, that instead of having multiple columns with values, you will have one column containing the former column names and one column containing the values. Next, you have to change the strings "Y20xx" into numerics. This can be done with mutate() which creates a new column or overwrites an old column. I took the last 4 letters of each string and converted them to numerics. Then, you can join the tables using one of the join() functions (https://dplyr.tidyverse.org/reference/mutate-joins.html). The inner_join() only keeps entries which are present in both data frames.

# loading the necessary package    
library (tidyverse)

combined <- b %>%
  gather(key = "Time", value = "NewColumn", -("Nation_Code")) %>% # wide format to long format
  mutate(Time = as.numeric(substr(Time, 2, 5))) %>% # change year format to numeric
  inner_join(a, by= c("Time" = "Time", "Nation_Code" = "Nation_Code")) # join the tables

>   Nation_Code Time NewColumn Name Other
> 1          ES 2009       100    c     z
> 2          IT 2010       150    a     x
> 3          DE 2011       300    b     y
  •  Tags:  
  • r
  • Related