Here is what my text file data looks like:
1 Abigail Seltzer
6019 8th St NW
Washington, DC-20011
Not Employed Not Employed
6019 8th St. NW
Washington,DC-20011
Credit Card 03/11/2022 $10,000.00 $10,00.00
2 Danita Cole
416 Shepherd St NW
Washington, DC-20005
Not Employed Not Employed
416 Shepherd Street NW
Washington,DC-20011
Credit Card 03/11/2022 $100.00 $350.00
3 Landon James
612 Quintana Pl NW
Washington, DC-20011
Social Inclusion Manager Brookfield Properties
655 New York Ave NW
Washington,DC-20000
Credit Card 03/11/2022 $100.00 $100.00
To be clear, each time there is a number before the name (1 Abigail, 2 Danita) the next 6 rows follow without blank rows, then there is one blank row after the 7th row before the next numbered entry (between 'Credit Card 03/11/2022 $10,000.00 $10,00.00' and '2 Danita')). So there are no blank rows between '1 Abigail Seltzer' and '6019 8th St NW'.
I would like to put each of the grouped rows into their own 7 columns - for each grouping of 7 grows, so:
column1 1 Abigail Seltzer 2 Danita Cole 3 Landon James
column2 6019 8th St NW 416 Shepherd St NW 612 Quintana Pl NW
column 3 Washington, DC-20011 Washington, DC-20005 Washington, DC-20000
And so on.
Thanks!
CodePudding user response:
You can read the text file using readLines
, and convert the resulting vector into a matrix with 8 rows. When you transpose this matrix, all your fields will have their own column
t(matrix(readLines("../mytext.txt"), nrow = 8))
#> [,1] [,2] [,3]
#> [1,] "1 Abigail Seltzer" "6019 8th St NW" "Washington, DC-20011"
#> [2,] "2 Danita Cole" "416 Shepherd St NW" "Washington, DC-20005"
#> [3,] "3 Landon James" "612 Quintana Pl NW" "Washington, DC-20011"
#> [,4]
#> [1,] "Not Employed Not Employed"
#> [2,] "Not Employed Not Employed"
#> [3,] "Social Inclusion Manager Brookfield Properties"
#> [,5] [,6]
#> [1,] "6019 8th St. NW" "Washington,DC-20011"
#> [2,] "416 Shepherd Street NW" "Washington,DC-20011"
#> [3,] "655 New York Ave NW" "Washington,DC-20000"
#> [,7] [,8]
#> [1,] "Credit Card 03/11/2022 $100.00 $100.00" ""
#> [2,] "Credit Card 03/11/2022 $100.00 $350.00" ""
#> [3,] "Credit Card 03/11/2022 $100.00 $100.00" ""
You could convert this into a data frame with as.data.frame
if you wished.
CodePudding user response:
A tidyverse option:
library(tidyverse)
tribble(
~value,
"1 Abigail Seltzer",
"6019 8th St NW",
"Washington, DC-20011",
"Not Employed Not Employed",
"6019 8th St. NW",
"Washington,DC-20011",
"Credit Card 03/11/2022 $10,000.00 $10,00.00",
"2 Danita Cole",
"416 Shepherd St NW",
"Washington, DC-20005",
"Not Employed Not Employed",
"416 Shepherd Street NW",
"Washington,DC-20011",
"Credit Card 03/11/2022 $100.00 $350.00",
"3 Landon James",
"612 Quintana Pl NW",
"Washington, DC-20011",
"Social Inclusion Manager Brookfield Properties",
"655 New York Ave NW",
"Washington,DC-20000",
"Credit Card 03/11/2022 $100.00 $100.00"
) |>
mutate(group = (row_number() - 1) %/% 7) |>
group_by(group) |>
mutate(row = row_number()) |>
pivot_wider(names_from = row, values_from = value) |>
rename("name" = `1`, "street" = `2`, "city" = `3`, "status" = `4`) # example columnrenaming if needed
#> # A tibble: 3 × 8
#> # Groups: group [3]
#> group name street city status `5` `6` `7`
#> <dbl> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 0 1 Abigail Seltzer 6019 8th St NW Washingto… Not E… 6019… Wash… Cred…
#> 2 1 2 Danita Cole 416 Shepherd St NW Washingto… Not E… 416 … Wash… Cred…
#> 3 2 3 Landon James 612 Quintana Pl NW Washingto… Socia… 655 … Wash… Cred…
Created on 2022-05-04 by the reprex package (v2.0.1)