Home > database >  How to filter rows in R data frame with multiple pairs of column values
How to filter rows in R data frame with multiple pairs of column values

Time:10-30

I have a dataframe of economics data, among others the infant mortality rate and the region of the world. I extracted outlier values per region.

                         income infant   region oil
Saudi.Arabia               1530  650.0     Asia yes
Afganistan                   75  400.0     Asia  no
Libya                      3010  300.0   Africa yes
Zambia                      310  259.0   Africa  no
Guinea                       79  216.0   Africa  no
Burma                        73  200.0     Asia  no

I have (1) the values of the outliers and (2) the name of the corresponding region for each value.

> out$out # values
[1] 300.0 170.0 650.0 400.0  44.8  43.3
> out$names[out$group] # name of region corresponding to each outlier
[1] "Africa"   "Americas" "Asia"     "Asia"     "Europe"   "Europe" 

I now want to filter the dataframe for the rows corresponding to each value pair. E.g. I want the rowname for the row with an infant value of 170 and region being Americas. I cannot use the joint condition of the column values being in both lists, because there are other rows with the same values but different groups (as they are not outliers in their group).

I know that I could do this in two steps, but I wonder if there is a more elegant solution in R to do this.

Thanks a lot in advance!

CodePudding user response:

df <- data.frame(income = c(1530, 75, 3010, 310, 79, 73), 
                 infant = c(650, 400, 300, 259, 216, 200), 
                 region = c("Asia", "Asia", "Africa", "Africa", "Africa", "Asia"), 
                 oil = c("yes", "no", "yes", "no", "no", "no"), 
                 row.names = c("Saudi.Arabia", "Afganistan", "Libya", 
                              "Zambia", "Guinea", "Burma"))

df

#               income infant region oil
# Saudi.Arabia   1530    650   Asia yes
# Afganistan       75    400   Asia  no
# Libya          3010    300 Africa yes
# Zambia          310    259 Africa  no
# Guinea           79    216 Africa  no
# Burma            73    200   Asia  no

x <- c(300.0, 170.0, 650.0, 400.0,  44.8,  43.3)
y <- c("Africa",   "Americas", "Asia",     "Asia",     "Europe",   "Europe")


df[paste0(df$infant, df$region) %in% paste0(x, y),]

#               income infant region oil
# Saudi.Arabia   1530    650   Asia yes
# Afganistan       75    400   Asia  no
# Libya          3010    300 Africa yes
  • Related