I'm struggling to find a way to organize my data frame in the format I need. The database I'm working with provide me a data frame as it shows in the image:
the thing is there's some date on the column ID and in the NF column there's characters. I'm trying a way to organize the dataframe as it follows: Organized Data frame
Where the column ID have only the ID's and in the NF column I have the ID's related to the respective NF.
I was thinking in a way to extract the ID's between the date format who are in the column and complete the NF column with the NF in the column who are in the same row of the date.
I'm currently doing it manually in spreadsheets but if it increase too much will be impossible and my coding skills are bellow the challenge at the moment.
CodePudding user response:
You could do something like this.
The mutate
makes anything in the nf
column NA unless it starts with 3 digits and a hyphen. You could adjust that regex pattern if needed for your larger dataset. fill
then copies down the nf
values you want. And the filter
removes the rows containing dates.
(You could use %>%
instead of |>
if not on a recent version of R.)
library(tidyverse)
df <- tribble(
~id, ~nf,
"27/07/2021", "001-00011598",
"5683", "BASE P/ NOTEBOOK NBC-50BK C3TECH",
"10180", "SUPORTE MONITOR 17 A 32 T1224N ELG",
"10208", "SUPORTE F160N DUPLO P/ MONITOR 17 A 27 ELG",
"12551", "MINI ADAPTADOR XC-BTT-04 XCELL",
"13121", "GABINETE PINK TGC-PO3P T-DAGGER",
"13306", "CASE HD SSD CGHD-G34 EXBOM",
"27/07/2021", "001-00011599",
"12551", "MINI ADAPTADOR XC-BTT-04 XCELL",
"12674", "SUPORTE FIXO GENIUS ELG",
"13306", "CASE HD SSD CGHD-G34 EXBOM"
)
df2 <- df |>
mutate(nf = if_else(!str_starts(nf, "\\d{3}-"), NA_character_, nf)) |>
fill(nf) |>
filter(!str_detect(id, "\\d{2}/\\d{2}/\\d{4}"))
df2
#> # A tibble: 9 × 2
#> id nf
#> <chr> <chr>
#> 1 5683 001-00011598
#> 2 10180 001-00011598
#> 3 10208 001-00011598
#> 4 12551 001-00011598
#> 5 13121 001-00011598
#> 6 13306 001-00011598
#> 7 12551 001-00011599
#> 8 12674 001-00011599
#> 9 13306 001-00011599
Created on 2022-07-01 by the reprex package (v2.0.1)
CodePudding user response:
Here is a tidyverse
approach:
Within an
ifelse
statement identify the pattern xxx-xxxxxxx all numbers.fill
theNA
down with the identified patternfilter out those with
\
in the string fromID
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(NF = ifelse(str_detect(NF, '\\d \\-\\d '), NF, NA_character_)) %>%
fill(NF, .direction = "down") %>%
filter(!str_detect(ID, '\\/'))
ID NF
1 5683 001-00011598
2 10180 001-00011598
3 10208 001-00011598
4 12551 001-00011598
5 13121 001-00011598
6 13306 001-00011598
7 12551 001-00011599
8 12674 001-00011599
9 13306 001-00011599
10 13306 001-00011600
data:
structure(list(ID = c("27/07/2021", "5683", "10180", "10208",
"12551", "13121", "13306", "27/07/2021", "12551", "12674", "13306",
"27/07/2021", "13306"), NF = c("001-00011598", "BASE P/ NOTEBOOK NBC-50BK C3TECH",
"SUPORTE MONITOR 17 A 32 T1224N ELG", "SUPORTE F160N DUPLO P/ MONITOR 17 A 27 ELG",
"MINI ADAPTADOR XC-BTT-04 XCELL", "GABINETE PINK TGC-P03P T-DAGGER",
"CASE HD SSD CGHD-G34 EXBOM", "001-00011599", "MINI ADAPTADOR XC-BTT-04 XCELL",
"SUPORTE FIXO GENIUS ELG", "CASE HD SSD CGHD-G34 EXBOM", "001-00011600",
"CASE HD SSD CGHD-G34 EXBOM")), class = "data.frame", row.names = c(NA,
-13L))