I like to select the last 3 values that are not empty per row from my dataframe:
df <- structure(list(V1 = c("Johannes Gutenberg University of Mainz",
"Eldagsener Str. 38", "Linneper Weg 1", "Gohrstraße 74", "Düppelstraße 36",
"Blutspende: Haus A3"), V2 = c(" Gebäude 900", " 31832 Springe",
" 40885 Ratingen", " 47475 Kamp-Lintfort", " 12163 Berlin", " Ebene -3"
), V3 = c(" Augustuspl. 4", " Germany", " Germany", " Germany",
" Germany", " Zentrum Innere Medizin (ZIM Blutbank / Immunhämatologisches Labor Haus A1"
), V4 = c(" 55131 Mainz", "", "", "", "", " Zentrum Operative Medizin (ZOM"
), V5 = c(" Germany", "", "", "", "", " Oberdürrbacher Str. 6"
), V6 = c("", "", "", "", "", " 97080 Würzburg"), V7 = c("",
"", "", "", "", " Germany")), row.names = 24:29, class = "data.frame")
Sometimes there is non-relevant text information at the beginning. And there are empty cells at the end. The crucial information is always the last 3 non-empty entries per row.
I'd like to go with tidyverse
, but any other solution is also appreciated.
CodePudding user response:
Row wise loop - apply, remove blanks, get last 3 values:
data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3) })))
# X1 X2 X3
# 24 Augustuspl. 4 55131 Mainz Germany
# 25 Eldagsener Str. 38 31832 Springe Germany
# 26 Linneper Weg 1 40885 Ratingen Germany
# 27 Gohrstraße 74 47475 Kamp-Lintfort Germany
# 28 Düppelstraße 36 12163 Berlin Germany
# 29 Oberdürrbacher Str. 6 97080 Würzburg Germany
Note: if there is a blank in between values, it will be dropped, and the columns might not line up, for example, compare the first row:
df[1, 4] <- ""
data.frame(t(apply(df, 1, function(i){ tail(i[ i != "" ], 3)})))
# X1 X2 X3
# 24 Gebäude 900 Augustuspl. 4 Germany
# 25 Eldagsener Str. 38 31832 Springe Germany
# 26 Linneper Weg 1 40885 Ratingen Germany
# 27 Gohrstraße 74 47475 Kamp-Lintfort Germany
# 28 Düppelstraße 36 12163 Berlin Germany
# 29 Oberdürrbacher Str. 6 97080 Würzburg Germany
CodePudding user response:
library(tidyverse)
df |>
mutate(rn = row_number()) |>
pivot_longer(cols = V1:V7) |>
mutate(isem = value !="") |>
filter(isem) |>
group_by(rn) |>
slice_tail(n=3) |>
select(-name, -isem) |>
mutate(rn = 1:3) |>
pivot_wider(names_from = rn, values_from = value) |>
unnest()
#> # A tibble: 6 × 3
#> `1` `2` `3`
#> <chr> <chr> <chr>
#> 1 " Augustuspl. 4" " 55131 Mainz" " Germany"
#> 2 "Eldagsener Str. 38" " 31832 Springe" " Germany"
#> 3 "Linneper Weg 1" " 40885 Ratingen" " Germany"
#> 4 "Gohrstraße 74" " 47475 Kamp-Lintfort" " Germany"
#> 5 "Düppelstraße 36" " 12163 Berlin" " Germany"
#> 6 " Oberdürrbacher Str. 6" " 97080 Würzburg" " Germany"
CodePudding user response:
Using tidyr and dplyr:
library(dplyr)
library(tidyr)
df %>%
mutate(row = row_number()) %>%
pivot_longer(!row) %>%
filter(value != "") %>%
group_by(row) %>%
slice_tail(n = 3) %>%
mutate(name = paste0("V", 1:3)) %>%
ungroup() %>%
pivot_wider()
# A tibble: 6 × 4
row V1 V2 V3
<int> <chr> <chr> <chr>
1 1 " Augustuspl. 4" " 55131 Mainz" " Germany"
2 2 "Eldagsener Str. 38" " 31832 Springe" " Germany"
3 3 "Linneper Weg 1" " 40885 Ratingen" " Germany"
4 4 "Gohrstraße 74" " 47475 Kamp-Lintfort" " Germany"
5 5 "Düppelstraße 36" " 12163 Berlin" " Germany"
6 6 " Oberdürrbacher Str. 6" " 97080 Würzburg" " Germany"
CodePudding user response:
Assuming that yours last three columns are named "V5", "V6" and "V7" like in the example, you could use filter
in a one line command
filter(df,V5!= "",V6 != "",V7 != "")
and if you only need these columns you could do
df |>
select(V5:V7) |>
filter(V5!= "",V6 != "",V7 != "")