Home > Back-end >  Removing specific values in R from entire dataframe
Removing specific values in R from entire dataframe

Time:11-16

I need to remove specific text/values from all cells it is present within a dataset. This should be the equivalent of the 'Find and Replace All' tool within Excel. This cannot be simply removing/omitting all 'NA' values as several entries are NA and need to remain such, thus meaning it needs to be removing "NA |". This also cannot be limited to just the columns listed below and must be applied to the entire data frame, as there are 20 additional columns to the right of V6.

Input:

V1            V2           V3           V4          V5         V6
NA|TR1000     NA|chr1      NA|1000      1200        A            
TR1000        chr1         1100         NA|1200     B            
NA|TR1000     NA|chr1      NA|1000      NA          C             
TR2000        NA|chr2      2000         NA          D              
TR2000        NA|chr2      2100         2500        E            
TR3000        NA           3000         3500        F            
NA|TR3000     chr3         NA|3000      3500        F            

I Need:

V1         V2         V3         V4         V5        V6
TR1000     chr1       1000       1200       A            
TR1000     chr1       1100       1200       B            
TR1000     chr1       1000       NA         C             
TR2000     chr2       2000       NA         D              
TR2000     chr2       2100       2500       E            
TR3000     NA         3000       3500       F            
TR3000     chr3       3000       3500       F            

CodePudding user response:

This should do it:

df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)

Demo:

df = read.table(text = 'V1            V2           V3           V4          V5         V6
NA|TR1000     NA|chr1      NA|1000      1200        A            
TR1000        chr1         1100         NA|1200     B            
NA|TR1000     NA|chr1      NA|1000      NA          C             
TR2000        NA|chr2      2000         NA          D              
TR2000        NA|chr2      2100         2500        E            
TR3000        NA           3000         3500        F            
NA|TR3000     chr3         NA|3000      3500        F            ', header = T)

df[] = lapply(df, sub, pattern = "NA|", replacement = "", fixed = TRUE)

df
  #       V1   V2   V3   V4 V5 V6
  # 1 TR1000 chr1 1000 1200  A   
  # 2 TR1000 chr1 1100 1200  B   
  # 3 TR1000 chr1 1000 <NA>  C   
  # 4 TR2000 chr2 2000 <NA>  D   
  # 5 TR2000 chr2 2100 2500  E   
  # 6 TR3000 <NA> 3000 3500  F   
  # 7 TR3000 chr3 3000 3500  F   

If you like using dplyr, here's a method there:

df %>% 
  mutate(across(everything(), sub, pattern = "NA|", replacement = "", fixed = TRUE))

CodePudding user response:

type.convert(data.frame(sub("NA|", "", as.matrix(df), fixed = TRUE)), as.is = TRUE)

      V1   V2   V3   V4 V5 V6
1 TR1000 chr1 1000 1200  A   
2 TR1000 chr1 1100 1200  B   
3 TR1000 chr1 1000   NA  C   
4 TR2000 chr2 2000   NA  D   
5 TR2000 chr2 2100 2500  E   
6 TR3000 <NA> 3000 3500  F   
7 TR3000 chr3 3000 3500  F   
  • Related