For the following dataframe, I would like to change the column names with the row where the 1st column starts with a word or words. Here it's the 2nd row and the word Company
. However, the row can be different like 1st, 5th or 10th row with different dataframe, and word can also be different like Investment
and others.
structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",
""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes",
""), X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA,
3L), class = "data.frame")
X1 X2 X3 X4 X5 X6
<chr> <chr> <chr> <dbl> <chr> <chr>
1 NA NA
2 Company # Type Reference NA Footnotes NA
3 Investments: NA NA
I'm thinking first to get the row number when the 1st column starts with a word/words, and then use that row number to change to column names, or maybe there are better ways to do that.
names(my_df)<- my_df[row_number,]
my_df <- my_df[-row_number,]
Desired Output
Company # Type Reference NA Footnotes NA
<chr> <chr> <chr> <dbl> <chr> <chr>
3 Investments: NA NA
CodePudding user response:
#row number of the first word in the first column
row_n <- min(which(nzchar(my_df[[1]])))
janitor::row_to_names(my_df, row_n)
output
# Company # Type Reference NA Footnotes NA
#3 Investments: NA <NA>
Note that you will have non-unique column names (NA
) if you do so. You can use clean_names
to quickly remedy this.
CodePudding user response:
You could try:
idx <- which(grepl('[^A-Za-z]', my_df$X1))[1]
colnames(my_df) <- my_df[idx, ]
my_df <- my_df[(idx 1):nrow(my_df), ]
Output:
Company # Type Reference NA Footnotes NA
3 Investments: NA <NA>
This would check if any row begins with a letter, take the first occurrence as column names and keep only rows following it.
CodePudding user response:
Base R:
colnames(df) <- df[2,]
df <- df[-2,]
df
Company # Type Reference NA Footnotes NA
1 NA <NA>
3 Investments: NA <NA>
CodePudding user response:
You could use which
to get the row with the string like this:
idx <- which("Company #" == my_df)
names(my_df) <- my_df[idx, ]
my_df <- my_df[-idx,]
my_df
#> Company # Type Reference NA Footnotes NA
#> 1 NA <NA>
#> 3 Investments: NA <NA>
Created on 2023-01-05 with reprex v2.0.2
CodePudding user response:
You can look for your names using which
on first column, then assign to colnames with names
and finally delete that row. All this in base R:
df <- structure(list(X1 = c("", "Company #", "Investments:"
), X2 = c("", "Type", ""), X3 = c("", "Reference",""), X4 = c(NA_real_, NA_real_, NA_real_), X5= c("", "Footnotes", ""),
X6 = c(NA_character_, NA_character_, NA_character_)), row.names = c(NA, 3L), class = "data.frame")
row_names <- which(nchar(data.frame(df)[, "X1"]) > 1)[1]
names(df) <- df[row_names, ]
df[-c(row_names),]
Output:
Company # Type Reference NA Footnotes NA
1 NA <NA>
3 Investments: NA <NA>
CodePudding user response:
This uses a keyword in the first column to find the row, then makes sure it has no duplicate names with make.unique
and no NA
s (character or numeric) with replace
.
key <- "Company #"
str <- as.character(dat[dat[,1] == key,][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")
result
dat
Company # Type Reference Missing Footnotes Missing_1
1 NA <NA>
2 Company # Type Reference NA Footnotes <NA>
3 Investments: NA <NA>
If the first non-empty cell should be picked use this
str <- as.character(dat[nchar(dat[,1]) != 0, ][1,])
colnames(dat) <- make.unique(
replace(str, str %in% "NA" | is.na(str), "Missing"), sep="_")