I have the following information:
Policy No Amount Info (1) Info (2) Info (3) Info (4) Info (5)
1001 100 500001 500002 500003 999999 999999
1002 200 600001 600002 999999 999999 999999
1003 300 700001 700002 700003 700004 700005
Above is how the information is provided.
This is how I want it to be represented:
Policy No Amount Info
1001 100 500001
1001 100 500002
1001 100 500003
1002 200 600001
1002 200 600002
1003 300 700001
1003 300 700002
1003 300 700003
1003 300 700004
1003 300 700005
Basically, I want to check each info column in row and if there is 999999 then go to the next row and produce the values up to the info column has 999999.
I have over 100,000 policy no and amount info like this.
Your help is highly appreciated.
CodePudding user response:
Pivot longer and then remove all 999999 values.
library(dplyr)
df %>%
pivot_longer(cols = starts_with("Info"), names_to = "Info") %>%
filter(!Info == 999999)
CodePudding user response:
Possible alternative to using a loop with tidyverse
:
library(tidyverse)
df %>%
mutate(across(everything(), ~na_if(.x, 999999))) %>%
pivot_longer(-c(`Policy No`, `Amount`), names_to = NULL, values_drop_na = TRUE)
Output
`Policy No` Amount value
<int> <int> <int>
1 1001 100 500001
2 1001 100 500002
3 1001 100 500003
4 1002 200 600001
5 1002 200 600002
6 1003 300 700001
7 1003 300 700002
8 1003 300 700003
9 1003 300 700004
10 1003 300 700005
Or if you want to stick with the for loop, then you could do something like this:
output <- NULL
for (i in names(df[,-c(1:2)])) {
x <- df[which(df[,i] != 999999), c("Policy No", "Amount",i)]
names(x)[3] <- "Info"
output <- rbind(output, x)
}
Data
df <- structure(list(`Policy No` = 1001:1003, Amount = c(100L, 200L,
300L), `Info (1)` = c(500001L, 600001L, 700001L), `Info (2)` = c(500002L,
600002L, 700002L), `Info (3)` = c(500003L, 999999L, 700003L),
`Info (4)` = c(999999L, 999999L, 700004L), `Info (5)` = c(999999L,
999999L, 700005L)), class = "data.frame", row.names = c(NA,
-3L))
CodePudding user response:
There are certain circumstances where we need to rotate the data from a wide format to a long format. I had a similar issue like this in some of my code but in reverse. Here is the question I posted. This also provides other options that can be used to solve this problem.
R- How to put multiple cases (rows) in one row
I would use reshape. Here is a link to the r documentation.
https://www.rdocumentation.org/packages/stats/versions/3.6.2/topics/reshape
Here is the data.
policyno <- c (1001, 1002, 1003)
amount <- c (100, 200, 300)
info1 <- c (500001, 600001, 700001)
info2 <- c (500002, 600002, 700002)
info3 <- c (500003, 999999, 700003)
info4 <- c (999999, 999999, 700004)
info5 <- c (999999, 999999, 700005)
data <- data.frame (policyno, amount, info1, info2, info3, info4, info5)
data
There are a few things we need to input for reshape to work. We need the name of the data frame (data), the id variable (policyno). We also need the name of the new variable we want to create (info) and a list of the variables which make up this new variable (info1, info2, info3, info4, info5). For generating this list of variables, there are a few ways to do this. I have seen posts where we list the row number. If we have a very small dataset and the variables (and their row numbers stay the same), then this method is fine. If you are working with a very large dataset where the row numbers might change depending upon how you are subsetting the data, then we can have R identify the row numbers for us. Which and calnames can be used for this purpose.
data2 <- reshape (data,
idvar = "policyno",
v.names = c ("info"),
varying = list (c (which(colnames(data) == "info1"),
which (colnames(data) == "info2"),
which(colnames(data) == "info3"),
which(colnames(data) == "info4"),
which(colnames(data) == "info5")
)
),
direction = "long")
data2
This rotates the data but doesn't remove the 999999 cases. To remove the 999999, we can use subset. As I minor note, this reshape created a variable labeled time which we can remove using select.
data3 <- subset (data2,
select = -c (time),
info != 999999)
data3
Again, this is just one way that can be used to solve the problem. Stackoverflow has a lot of posts about reshape and transpose that are very useful.