I have the following problem in R. I have a first dataframe where I have a variable indicating ID code for loans to specific firms, and I have a second dataframe which has a column with the ID code for the loans and a column for the name of each firm (thus relating ID code to specific names). The two dataframes are not of the same length as the first lists related bank loans over a certain period of time and therefore the same IDs appear multiple times, in a format that looks like:
Data1 <- data.frame(x <- c(1, 2, 2, 2, 3, 3, 4, 5, 5, 5), y <- (runif(10)*100))
colnames(junk) <- c("Loan ID", "Loan Amount")
Loan ID Loan Amount
1 1 93.548552
2 2 22.751827
3 2 8.751289
4 2 29.347396
5 3 97.583019
6 3 32.638592
7 4 67.498956
8 5 78.206820
9 5 78.786014
10 5 42.112703
While the second (let's call it Data2) essentially looks like this:
Data2 <- data.frame(x <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), y <- c("name1", "name2", "name2", "name3",
"name4", "name5", "name6", "name6",
"name6", "name7"))
colnames(Data2) <- c("Loan ID", "Firm Name")
Loan ID Firm Name
1 1 name 1
2 2 name 2
3 3 name 2
4 4 name 3
5 5 name 4
6 6 name 5
7 7 name 6
8 8 name 6
9 9 name 6
10 10 name 7
Where as you can see multiple IDs link back to one firm name as at different point in time firms' names were linked to different operations.
Essentially what I want to do is to use Data2 to change the IDs in Data1 with the name of the firms from Data2. What I've tried is to use decode from the decoder package on a list created from Data2 such as:
Keys <- setNames(as.list(Data2$FirmName), Data2$LoanID)
decode(Data1$LoanID, Keys)
Which does not work as each some keys have the same value (i.e. correspond to the same name). I am not really sure how to get around this problem, and I can't change the ID to names manually as in my dataset I have thousands of observations. To reiterate what I want to achieve is that if in Data 1 the Loan ID is 1, then is replaced with name 1, if the ID is 2 then it is replaced with name 2, if it is 3 then it is replaced with name 2 and so on.
CodePudding user response:
It sounds like you're interested in a left join, implemented in R via merge
:
Data1 <- data.frame(loan_id = c(1, 2, 2, 2, 3, 3, 4, 5, 5, 5),
amount = (runif(10)*100))
Data2 <- data.frame(loan_id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10),
y = c("name1", "name2", "name2", "name3",
"name4", "name5", "name6", "name6",
"name6", "name7"))
merge(Data1, Data2)
loan_id amount y
1 1 73.4713285 name1
2 2 51.1956226 name2
3 2 57.2294079 name2
4 2 24.3442836 name2
5 3 70.6357522 name2
6 3 67.2642131 name2
7 4 7.6427665 name3
8 5 0.3323552 name4
9 5 36.5309370 name4
10 5 71.3172913 name4