Home > Software engineering >  How to preform loc with one condition that include two columns
How to preform loc with one condition that include two columns

Time:03-22

Hello guys I need your help.

I have df with two columns A and B both of them are columns with string values

example:

df_1 = pd.DataFrame(data={
    "A":['a','b','c'],
    "B":['a x d','z y w','q m c'] #string values not a list
})
print(df_1)

#output
   A      B
0  a  a x d
1  b  z y w
2  c  q m c

now what I'm trying to do is to preform loc in the df_1 to get all the row that col B cointain the string value in col A.

In this example the output i want is the first and the third rows

   A      B
0  a  a x d # 'a x d' contain value 'a'
2  c  q m c # 'q m c' contain value 'c'

I have tried different loc condition but got unhashable type: 'Series' error:

df_1.loc[df_1["B"].str.contains(df_1["A"])] #TypeError: unhashable type: 'Series'
df_1.loc[df_1["A"] in df_1["B"]] #TypeError: unhashable type: 'Series'

I really don't want to use a for/while loop because of the size of the df.

any idea how can I preform this? Thanks

CodePudding user response:

Cute!

from operator import contains

df_1.loc[map(contains, *map(df_1.get, ['B', 'A']))]

   A      B
0  a  a x d
2  c  q m c

And without the import but with an ugly dunder ... /-:

df_1.loc[map(str.__contains__, *map(df_1.get, ['B', 'A']))]

Also, to the OP... don't accept this answer. I was just golfing. You should never use this in production code as it is less than transparent due to the use of 2 maps and a splat.

And just to be super duper clear...

mozway's is the better one

CodePudding user response:

There is no vectorial method, to map in using two columns. You need to loop here:

mask = [a in b for a,b in zip(df_1['A'], df_1['B'])]

df_1.loc[mask]

Output:

   A      B
0  a  a x d
2  c  q m c
comparison of speed (3000 rows)
# operator.contains
518 µs ± 4.61 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# list comprehension
554 µs ± 3.84 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

# numpy.apply_along_axis
7.32 ms ± 58.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# apply
20.7 ms ± 379 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

CodePudding user response:

You can try:

out = df_1.loc[df_1.apply(lambda x: x['A'] in x['B'], axis=1)]
print(out)

# Output
   A      B
0  a  a x d
2  c  q m c

Alternative with numpy:

df_1.loc[np.apply_along_axis(lambda x: x[0] in x[1], axis=1, arr=df_1)]

CodePudding user response:

When accessing two different columns in pandas you can use .apply()

df.apply(lambda row: row['A'] in row['B'], axis = 1)

This creates a Boolean Series which you can use for your loc condition which selects only the columns where 'A' is in 'B':

df.loc[df.apply(lambda row: row['A'] in row['B'], axis = 1)]

CodePudding user response:

df_1[df_1.apply(lambda x:x['A'] in x['B'],axis=1)]
  • Related