Home > OS >  Why are my data frames not merging/joining?
Why are my data frames not merging/joining?

Time:12-14

I am trying to merge or join two dataframes in R but I have tried every combination of both functions and neither seem to be producing the desired result. Both data frames share a column named "CTYNAME" which is a char in both dataframes. Both dfs have a row with 'Butler County' for example, but even when using full_join the final product has a row with only Butler County and a row with Butler County and all of the data from the second data frame, instead of just combining the two rows.

Essentially, there is unique overlap in the 'CTYNAME' column 19 times so I expect a 19 column output but I can only get 0 (using inner_join) or 107 (the sum of columns in both dfs, using full_join).

What I have currently

frank_stops = 
  frank_stops %>%
  filter(State == "Ohio") %>%
  select(4) %>% 
  distinct() %>%
  rename(CTYNAME = 'County') %>% 
  mutate(CTYNAME = paste(CTYNAME, " County")) %>% 
  data.frame()
frank_stops$CTYNAME <- as.character(frank_stops$CTYNAME)


frank_stops


ohio_data = 
  ohio_data %>% 
  rename(CTYNAME = 'CTYNAME') %>%
  filter(AGEGRP == 0, YEAR == 12) %>% 
  data.frame()
ohio_data$CTYNAME <- as.character(ohio_data$CTYNAME)

ohio_data


together = full_join(frank_stops, ohio_data, by="CTYNAME")
together

Console Output:

Rows: 20064 Columns: 80── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr  (4): SUMLEV, COUNTY, STNAME, CTYNAME
dbl (76): STATE, YEAR, AGEGRP, TOT_POP, TOT_MALE, TOT_FEMALE, WA_MALE, WA_FEMALE, BA_MALE, BA_FEMALE, IA_MALE, IA_FEM...
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Rows: 153 Columns: 4── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (4): Date of Visit, Town/City, State, County
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.Joining, by = "CTYNAME"

CodePudding user response:

I had a similar issue adding str_squish() on the join key in both data frames prior to the join resolved it.

frank_stops$CTYNAME <-str_squish(frank_stops$CTYNAME) 
ohio_data$CTYNAME<- str_squish(ohio_data$CTYNAME)

will resolve the issue if you have uneven spacing an if that is impacting the join

  • Related