I have a dataset that has information on power plants that are operating (currently functioning) or planned (will be opened in the future). However, for a few cases, there are power plants with the status "operational & planned". These power plants already exist but will expand at a given date. I need to correct this, by doubling the entries of this outlier, i.e., when a power plant is "operational & planned" I need to keep the current row (which I will rename to operational) and create a new identical row in which the status is planned. <br />
Simply put, what I need to do is, when there is an "operational & planned" power plant, I need to split it into operational and planned by creating a new row with the same data points. I am not being able to do this with ifelse
. Could someone suggest an approach, please?
Example<br />
Firm Status Production<br />
A Operating 1000<br />
B Planned 1030<br />
C Operating 1000<br />
D Operating & Planned 1000<br />
Output<br />
Firm Status Production<br />
A Operating 1000<br />
B Planned 1030<br />
C Operating 1000<br />
D Operating 1000<br />
D Planned 1000<br />
CodePudding user response:
You can use the package splitstackshape
. You can use this code:
First your data:
df <- data.frame(Firm = c("A", "B", "C", "D"),
Status = c("Operating", "Planned", "Operating", " Operating & Planned"),
Production = c(1000, 1030, 1000, 1000))
Output:
Firm Status Production
1 A Operating 1000
2 B Planned 1030
3 C Operating 1000
4 D Operating & Planned 1000
Next use this code to split the row in two rows with same value:
library(splitstackshape)
df2 <- cSplit(df, "Status", sep = "&", direction = "long")
df2
Output:
Firm Status Production
1: A Operating 1000
2: B Planned 1030
3: C Operating 1000
4: D Operating 1000
5: D Planned 1000
CodePudding user response:
You're looking for tidyr::separate_rows
:
library(tidyr)
separate_rows(df, Status, sep = " & ")
Using @Quinten's dataset:
df <- data.frame(Firm = c("A", "B", "C", "D"),
Status = c("Operating", "Planned", "Operating", "Operating & Planned"),
Production = c(1000, 1030, 1000, 1000))
df %>%
separate_rows(Status, sep = " & ")
# A tibble: 5 x 3
Firm Status Production
<chr> <chr> <dbl>
1 A Operating 1000
2 B Planned 1030
3 C Operating 1000
4 D Operating 1000
5 D Planned 1000