Home > Software design >  Converting vertical data to horizontal data extracting first and last date
Converting vertical data to horizontal data extracting first and last date

Time:11-23

I'm relatively new working with R. I have a database in which 2 columns look like the table below:

pt_id Date
1222 20-01-2021
1222 18-11-2018
1222 17-02-2015
1222 21-04-2015
2555 18-01-2002
2555 03-04-2009
2555 25-12-2010

I would like to create a new dataframe in which pt_id are merged, and create 2 columns in which only the first date and the last date are saved. I would like it to look like a table below

pt_id Date_first Date_last
1222 17-02-2015 20-01-2021
2555 18-01-2002 25-12-2010

The table above is just a small example, the database im working with is much larger. These are the packages im working with at the moment:

library(tidyverse)
library(haven)
library(tidyr)
library(dplyr)
library(date)
library(reshape2)
library(foreign)
library(data.table)
library(stringr)
library(plyr)
library(irr)
library(vcd)
library(vctrs)

I hope this is possible, thanks in advance.

CodePudding user response:

You can do:

mydf |>
  mutate(Date = lubridate::dmy(Date)) |> # Only use if the variable is currently set to character
  group_by(pt_id) |>
  filter(Date == min(Date) | Date == max(Date)) |>
  mutate(date_vars = if_else(Date == min(Date), "Date_first", "Date_last")) |> 
  ungroup() |> 
  pivot_wider(pt_id, values_from = Date, names_from = date_vars)

# A tibble: 2 x 3
  pt_id Date_last  Date_first
  <dbl> <date>     <date>    
1  1222 2021-01-20 2015-02-17
2  2555 2010-12-25 2002-01-18

CodePudding user response:

I am also new to R and thought I would have a go, to possibly inspire someone to correct me - this uses only base R and has a very convoluted for loop:

df1$Date <- as.Date(df1$Date)
#Create a new df with a single entry for each patient
new_df <- unique(df1["pt_id"])

#make empty columns for the dates
new_df['date_first'] <- NA
new_df['date_last'] <- NA

#for each patient...
for (i in (1:nrow(new_df))){
  #make an empty list to store the list of dates...
  pt <- c()
  #and for each row in the original dataframe...
  for (j in (1:nrow(df1))){
    #make another empty list to store the single date for each record
    x = c()
    # and if the patient ID is in the row being read...
    if(df1[j,1]==new_df[i,1]){
      #append that date to list x and move on through the original df
      x<-append(x,df1[j,2])
    }
  #then append this list to the list pt...
  pt <- append(pt,x)
  }
#add the min and max values from the list pt to the new df for each entry in the new dataframe and move on to the next patient in the new dataframe
new_df[i,2] <- min(pt)
new_df[i,3] <- max(pt)
}

#make the columns dates again (as they were converted to numeric in the 
max/min)
new_df$date_first <- as.Date(new_df$date_first)
new_df$date_last <- as.Date(new_df$date_last)

print(new_df)
  • Related