Home > OS >  Unequally spaced txt file into dataframe in R
Unequally spaced txt file into dataframe in R

Time:07-19

I have a .txt file which looks like this:

College Names Food rating Critic Rating Student rating

John Hopkins Institute 8.4 8.6 9.2

Stanford : New School 9.4 5.6 9.2

Mayor College 6.4 7.6 4.2

. . . These are all space separated values in a txt file.

I want to import this data in the above format in R studio in the following table format: (There are much more rows than this)

College Names Food rating Critic Rating Student rating
John Hopkins Institute 8.4 8.6 9.2
Stanford : New School 9.4 5.6 9.2
Mayor College 6.4 7.6 4.2

I think it can be done taking that only the first column contains characters and after this there are only numbers. Headers can be added later.

But the issue is that they aren't evenly spaced. John Hopkins Institute has 3 spaces while Mayor College has only 2 spaces, so cannot import it as data.table.

CodePudding user response:

You can use tidyr::extract with a custom regular expression:

d <- tibble(input=c("John Hopkins Institute 8.4 8.6 9.2",
"Stanford : New School 9.4 5.6 9.2",
"Mayor College 6.4 7.6 4.2"))

d %>% extract(input,
          regex="(. ) ([.\\d] ) ([.\\d] ) ([.\\d] )",
          into=c("College Names", "Food rating", "Critic Rating", "Student rating"))

See the explanation of the regex here: https://regex101.com/r/fmQiU7/1

CodePudding user response:

You should probably look to find a suitable way to read-in the file in a way you can process it. If that fails, here's an answer to structure the data after reading-in:

Read-in:

df <- read.table(text = "College Names Food rating Critic Rating Student rating

John Hopkins Institute 8.4 8.6 9.2

Stanford : New School 9.4 5.6 9.2

Mayor College 6.4 7.6 4.2",  
                header = TRUE, quote="", sep="\t")

Extract data into relevant columns:

library(tidyr)    
df %>%
  extract(College.Names.Food.rating.Critic.Rating.Student.rating,
          into = c("College_Name", "Food_rating", "Critic_Rating", "Student_Rating"),
          regex = "(\\D )\\s(\\d\\.\\d)\\s(\\d\\.\\d)\\s(\\d\\.\\d)")
            College_Name Food_rating Critic_Rating Student_Rating
1 John Hopkins Institute         8.4           8.6            9.2
2  Stanford : New School         9.4           5.6            9.2
3          Mayor College         6.4           7.6            4.2
  •  Tags:  
  • r
  • Related