How to put entire row as column names?-CodePudding

For the following dataframe, I would like to change the column names with the row where the 1st column starts with a word or words. Here it's the 2nd row and the word Company. However, the row can be different like 1st, 5th or 10th row with different dataframe, and word can also be different like Investment and others.

structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference", 
""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", 
""), X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
3L), class = "data.frame")

          X1             X2       X3        X4       X5       X6
    <chr>              <chr>    <chr>      <dbl>   <chr>    <chr>
1                                           NA               NA
2   Company #           Type   Reference    NA   Footnotes   NA
3   Investments:                            NA               NA

I'm thinking first to get the row number when the 1st column starts with a word/words, and then use that row number to change to column names, or maybe there are better ways to do that.

names(my_df)<- my_df[row_number,]
my_df <- my_df[-row_number,]

Desired Output

        Company #   Type   Reference    NA   Footnotes    NA
    <chr>           <chr>   <chr>      <dbl>   <chr>    <chr>
3   Investments:                         NA              NA

CodePudding user response：

#row number of the first word in the first column
row_n <- min(which(nzchar(my_df[[1]])))
janitor::row_to_names(my_df, row_n)

output

#     Company # Type Reference NA Footnotes   NA
#3 Investments:                NA           <NA>

Note that you will have non-unique column names (NA) if you do so. You can use clean_names to quickly remedy this.

CodePudding user response：

You could try:

idx <- which(grepl('[^A-Za-z]', my_df$X1))[1]
colnames(my_df) <- my_df[idx, ]
my_df <- my_df[(idx   1):nrow(my_df), ]

Output:

     Company # Type Reference NA Footnotes   NA
3 Investments:                NA           <NA>

This would check if any row begins with a letter, take the first occurrence as column names and keep only rows following it.

CodePudding user response：

Base R:

colnames(df) <- df[2,]
df <- df[-2,]
df

     Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

CodePudding user response：

You could use which to get the row with the string like this:

idx <- which("Company #" == my_df)
names(my_df) <- my_df[idx, ]
my_df <- my_df[-idx,]
my_df
#>      Company # Type Reference NA Footnotes   NA
#> 1                             NA           <NA>
#> 3 Investments:                NA           <NA>

^{Created on 2023-01-05 with reprex v2.0.2}

CodePudding user response：

You can look for your names using which on first column, then assign to colnames with names and finally delete that row. All this in base R:

df <- structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", ""), 
X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")

row_names <- which(nchar(data.frame(df)[, "X1"]) > 1)[1]

names(df) <- df[row_names, ]
df[-c(row_names),]

Output:

     Company # Type Reference NA Footnotes   NA
1                             NA           <NA>
3 Investments:                NA           <NA>

CodePudding user response：

This uses a keyword in the first column to find the row, then makes sure it has no duplicate names with make.unique and no NAs (character or numeric) with replace.

key <- "Company #"

str <- as.character(dat[dat[,1] == key,][1,])

colnames(dat) <- make.unique(
  replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")

result

dat
     Company # Type Reference Missing Footnotes Missing_1
1                                  NA                <NA>
2    Company # Type Reference      NA Footnotes      <NA>
3 Investments:                     NA                <NA>

If the first non-empty cell should be picked use this

str <- as.character(dat[nchar(dat[,1]) != 0, ][1,])

colnames(dat) <- make.unique(
  replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")