Home > Mobile >  How can we search for a string based on a group of records in a dataframe?
How can we search for a string based on a group of records in a dataframe?


I have a dataframe that looks like this.

import pandas as pd
import numpy as np
# data's stored in dictionary
details = {
    'address_id': [111, 111, 111, 111, 222, 222, 222, 222, 333],
    'mydate':['2022-01-24', '2022-01-24', '2022-03-28', '2022-03-28', '2022-01-24', '2022-01-24', '2022-03-28', '2022-03-28', '2022-01-24'],
    'mystring': ['att', 'verizon', 'comcast', 'verizon', 'att', 'verizon', 'att', 'verizon', 'verizon']
df = pd.DataFrame(details)

enter image description here

For a group of identical IDs and changing dates, I want to see if a string is NOT found. Basically, I want to see if 'att' is found in earlier dates and missing in later dates. If 'att' shows up repeatedly in earlier and later dates, I don't care.

The logic is:

att shows up in 111 & 1/24/2022 att is missing in 111 & 3/28/2022

I want to end up with a dataframe like this.

    address_id  mydate      mystring    ismissing
0   111         2022-01-24  att         False
1   111         2022-01-24  verizon     False
2   111         2022-03-28  comcast     True
3   111         2022-03-28  verizon     True
4   222         2022-01-24  att         False
5   222         2022-01-24  verizon     False
6   222         2022-03-28  att         False
7   222         2022-03-28  verizon     False
8   333         2022-01-24  verizon     False

CodePudding user response:

Combine 2 boolean masks with and:

  • first ( id_has_attr ) tells whether attr is present for each id
  • second assumes it is present, and checks further conditions
id_has_attr = df.groupby('address_id')['mystring'].transform(
    lambda col: col.str.contains('att').any()
df['ismissing'] = df.groupby(['address_id', 'mydate'])['mystring'].transform(
    lambda col: ~ col.str.contains('att').any()
    ) & id_has_attr


   address_id      mydate mystring  ismissing
0         111  2022-01-24      att      False
1         111  2022-01-24  verizon      False
2         111  2022-03-28  comcast       True
3         111  2022-03-28  verizon       True
4         222  2022-01-24      att      False
5         222  2022-01-24  verizon      False
6         222  2022-03-28      att      False
7         222  2022-03-28  verizon      False
8         333  2022-01-24  verizon      False
  • Related