Home > Net >  Combine two rows in R that are separated
Combine two rows in R that are separated

Time:01-30

I am trying to clean the dataset so that all data is in its appropriate cell by combining rows since they are oddly separated. There is an obvious nuance to the dataset in that there are some rows that are correctly coded and there are some that are not.

Here is an example of the data:

Rank Store City Address Transactions Avg. Value Dollar Sales Amt.
40 1404 State College Hamilton Square Shop Center
230 W Hamilton Ave 155548 52.86 8263499
41 2310 Springfield 149 Baltimore Pike 300258 27.24 8211137
42 2514 Erie Yorktown Centre
2501 West 12th Street 190305 41.17 7862624

Here is an example of how I want the data:

Rank Store City Address Transactions Avg. Value Dollar Sales Amt.
40 1404 State College Hamilton Square Shop Center, 230 W Hamilton Ave 155548 52.86 8263499
41 2310 Springfield 149 Baltimore Pike 300258 27.28 8211137
42 2514 Erie Yorktown Centre, 2501 West 12th Street 190305 41.17 7862624

Is there an Excel or R function to fix this, or does anyone know how to write an R functional to correct this?

I read into the CONCATENATE function in excel and realized it was not going to accomplish anything. I figured an R functional would be the only way to fix this.

CodePudding user response:

The concatenate function will work here or in the excel select the columns and by using the merge formula in the formala option you can complete the task.

CodePudding user response:

I recommend checking how the file is being parsed. From the example data you provided, it looks like the address column is being split on ", " and going to the next line. Based on this assumption alone, below is a potential solution using the tidyverse:

library(tidyverse)
original_data <- tibble(Rank = c(40,NA,41,42,NA),
                        Store = c(1404,NA,2310,2514,NA),
                        City = c("State College",NA,"Springfield","Erie",NA),
                        Address = c("Hamilton Square Shop Center",
                                    "230 W Hamilton Ave","149 Baltimore Pike",
                                    "Yorktown Centre","2501 West 12th Street"),
                        Transactions = c(NA,155548,300258,NA,190305),
                        `Avg. Value` = c(NA,52.86,27.24,NA,41.17),
                        `Dollar Sales Amt.` = c(NA,8263499,8211137,NA,7862624))

new_data <- original_data %>%
    fill(Rank:City) %>%
    group_by_at(vars(Rank:City)) %>%
    mutate(Address1 = lag(Address)) %>%
    slice(n()) %>%
    ungroup() %>%
    mutate(Address = if_else(is.na(Address1), Address,
                             str_c(Address1, Address, sep = ", "))) %>%
    select(Rank:`Dollar Sales Amt.`)

new_data
  • Related