I need to remove specific text/values from all cells it is present within a dataset. This should be the equivalent of the 'Find and Replace All' tool within Excel. This cannot be simply removing/omitting all 'NA' values as several entries are NA and need to remain such, thus meaning it needs to be removing "NA |". This also cannot be limited to just the columns listed below and must be applied to the entire data frame, as there are 20 additional columns to the right of V6.
Input:
V1 V2 V3 V4 V5 V6
NA|TR1000 NA|chr1 NA|1000 1200 A
TR1000 chr1 1100 NA|1200 B
NA|TR1000 NA|chr1 NA|1000 NA C
TR2000 NA|chr2 2000 NA D
TR2000 NA|chr2 2100 2500 E
TR3000 NA 3000 3500 F
NA|TR3000 chr3 NA|3000 3500 F
I Need:
V1 V2 V3 V4 V5 V6
TR1000 chr1 1000 1200 A
TR1000 chr1 1100 1200 B
TR1000 chr1 1000 NA C
TR2000 chr2 2000 NA D
TR2000 chr2 2100 2500 E
TR3000 NA 3000 3500 F
TR3000 chr3 3000 3500 F
CodePudding user response:
This should do it:
df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)
Demo:
df = read.table(text = 'V1 V2 V3 V4 V5 V6
NA|TR1000 NA|chr1 NA|1000 1200 A
TR1000 chr1 1100 NA|1200 B
NA|TR1000 NA|chr1 NA|1000 NA C
TR2000 NA|chr2 2000 NA D
TR2000 NA|chr2 2100 2500 E
TR3000 NA 3000 3500 F
NA|TR3000 chr3 NA|3000 3500 F ', header = T)
df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)
df
# V1 V2 V3 V4 V5 V6
# 1 TR1000 chr1 1000 1200 A
# 2 TR1000 chr1 1100 1200 B
# 3 TR1000 chr1 1000 <NA> C
# 4 TR2000 chr2 2000 <NA> D
# 5 TR2000 chr2 2100 2500 E
# 6 TR3000 <NA> 3000 3500 F
# 7 TR3000 chr3 3000 3500 F
If you like using dplyr
, here's a method there:
df %>%
mutate(across(everything(), sub, pattern = "NA|", replacement = "", fixed = TRUE))
CodePudding user response:
type.convert(data.frame(sub("NA|", "", as.matrix(df), fixed = TRUE)), as.is = TRUE)
V1 V2 V3 V4 V5 V6
1 TR1000 chr1 1000 1200 A
2 TR1000 chr1 1100 1200 B
3 TR1000 chr1 1000 NA C
4 TR2000 chr2 2000 NA D
5 TR2000 chr2 2100 2500 E
6 TR3000 <NA> 3000 3500 F
7 TR3000 chr3 3000 3500 F