Given a big table, in which there are two important columns:
- nation code
- time
Name Nation code Time Other
a IT 2010 x
b DE 2011 y
c ES 2009 z
I want to construct another column using those two indicators to take the data from another table, which is built like the following
Nation code Y2009 Y2010 Y2011
DE 200 100 300
ES 100 150 350
GB 200 250 50
IT 50 150 100
The final result should be the following
Name Nation code Time Other NewColumn
a IT 2010 x 150
b DE 2011 y 300
c ES 2009 z 100
CodePudding user response:
To get what you want, you'll need to perform two steps: First you'll want df2
in a long format (e.g. using pivot_longer
), and then you'd do a left join with df1
(e.g. using left_join
).
I.e.:
library(dplyr)
library(tidyr)
df2_long <-
df2 |>
pivot_longer(-Nation_code, names_prefix = "Y", names_to = "Time", values_to = "new_col") |>
mutate(Time = as.numeric(Time))
# # A tibble: 12 × 3
# Nation_code Time new_col
# <chr> <dbl> <dbl>
# 1 DE 2009 200
# 2 DE 2010 100
# 3 DE 2011 300
# 4 ES 2009 100
# 5 ES 2010 150
# 6 ES 2011 350
# 7 GB 2009 200
# 8 GB 2010 250
# 9 GB 2011 50
# 10 IT 2009 50
# 11 IT 2010 150
# 12 IT 2011 100
df1 |>
left_join(df2_long, by = c("Nation_code", "Time"))
Output:
# A tibble: 3 × 5
Name Nation_code Time Other new_col
<chr> <chr> <dbl> <chr> <dbl>
1 a IT 2010 x 150
2 b DE 2011 y 300
3 c ES 2009 z 100
Data:
library(readr)
df1 <- read_table("Name Nation_code Time Other
a IT 2010 x
b DE 2011 y
c ES 2009 z")
df2 <- read_table("Nation_code Y2009 Y2010 Y2011
DE 200 100 300
ES 100 150 350
GB 200 250 50
IT 50 150 100")
CodePudding user response:
base R (and reshape2
)
reshape2::melt(df2, "Nation_code", variable.name = "Time", value.name = "NewColumn") |>
transform(Time = as.integer(sub("\\D", "", Time))) |>
merge(df1, by = c("Nation_code", "Time"), all.y = TRUE)
# Nation_code Time NewColumn Name Other
# 1 DE 2011 300 b y
# 2 ES 2009 100 c z
# 3 IT 2010 150 a x
data.table
# library(data.table)
DT1 <- as.data.table(df1)
DT2 <- as.data.table(df2)
DT1[
melt(DT2, "Nation_code", variable.name = "Time", value.name = "NewColumn"
)[, Time := as.integer(sub("\\D", "", Time))],
NewColumn := i.NewColumn, on = .(Nation_code, Time)][]
# Name Nation_code Time Other NewColumn
# <char> <char> <int> <char> <int>
# 1: a IT 2010 x 150
# 2: b DE 2011 y 300
# 3: c ES 2009 z 100
This is a combination of two frequent questions on SO:
- Pivoting wide-to-long, Reshaping data.frame from wide to long format for reference, there's also long-to-wide, Reshape multiple value columns to wide format
- Merge/join, see How to join (merge) data frames (inner, outer, left, right) and What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?
CodePudding user response:
library(tidyverse)
df <- read_table("Name Nation_code Time Other
a IT 2010 x
b DE 2011 y
c ES 2009 z")
df1 <- read_table("Nation_code Y2009 Y2010 Y2011
DE 200 100 300
ES 100 150 350
GB 200 250 50
IT 50 150 100")
df1 <- df1 %>%
pivot_longer(-Nation_code,
names_to = "Time",
values_to = "NewColumn") %>%
mutate(Time = Time %>% str_remove_all("Y") %>%
as.numeric)
df %>%
left_join(df1, by = c("Nation_code", "Time"))
# A tibble: 3 x 5
Name Nation_code Time Other NewColumn
<chr> <chr> <dbl> <chr> <dbl>
1 a IT 2010 x 150
2 b DE 2011 y 300
3 c ES 2009 z 100
CodePudding user response:
Preparing the input tables:
a <- data.frame(
"Name" = c("a","b","c"),
"Nation_Code" = c("IT", "DE", "ES"),
"Time" = c(2010, 2011, 2009),
"Other" = c("x", "y", "z")
)
b <- data.frame(
"Nation_Code" = c("DE", "ES", "GB", "IT"),
"Y2009" = c(200,100,200,50),
"Y2010" = c(100,150,250,150),
"Y2011" = c(300,350,50,100)
)
This is how I would go about it: First, you need to convert the second table from the so-called "wide format" to the "long format". You can do this with gather()
. This means, that instead of having multiple columns with values, you will have one column containing the former column names and one column containing the values. Next, you have to change the strings "Y20xx" into numerics. This can be done with mutate()
which creates a new column or overwrites an old column. I took the last 4 letters of each string and converted them to numerics. Then, you can join the tables using one of the join()
functions (https://dplyr.tidyverse.org/reference/mutate-joins.html). The inner_join()
only keeps entries which are present in both data frames.
# loading the necessary package
library (tidyverse)
combined <- b %>%
gather(key = "Time", value = "NewColumn", -("Nation_Code")) %>% # wide format to long format
mutate(Time = as.numeric(substr(Time, 2, 5))) %>% # change year format to numeric
inner_join(a, by= c("Time" = "Time", "Nation_Code" = "Nation_Code")) # join the tables
> Nation_Code Time NewColumn Name Other
> 1 ES 2009 100 c z
> 2 IT 2010 150 a x
> 3 DE 2011 300 b y