Home > Blockchain >  Remove all the occurrences of a term in a list
Remove all the occurrences of a term in a list

Time:06-07

I am trying to remove the United States listed in the organization columns within a list for all the rows of an extensive dataset.

My df looks something like this:

ID Organizations
1 ['education', 'health', 'United States', 'facebook']
2 ['health', 'Airlines', 'WHO', 'United States']

...

I want my output to look like this:

ID Organizations
1 ['education', 'health','facebook']
2 ['health', 'Airlines', 'WHO']

The code I tried:

df=df['organizations'].remove("United States")

gave me the following error:

AttributeError: 'Series' object has no attribute 'remove'

CodePudding user response:

You would need to loop here, using apply:

df['Organizations'].apply(lambda l: l.remove('United States'))

Or a list comprehension:

df['Organizations'] = [[x for x in l if x != 'United States'] for l in df['Organizations']]

Output:

   ID                  Organizations
0   1  [education, health, facebook]
1   2        [health, Airlines, WHO]

Note that the first one will fail if you don't have 'United States' in all the lists

handling NaNs

df['Organizations'] = [[x for x in l if x != 'United States']
                       if isinstance(l, list) else l
                       for l in df['Organizations']]

Used input:

df = pd.DataFrame({'ID': [1, 2], 
                   'Organizations': [['education', 'health', 'United States', 'facebook'],
                                     ['health', 'Airlines', 'WHO', 'United States']]})

CodePudding user response:

You could also consider:

(df.explode('Organizations')
  .query('Organizations != "United States"').groupby('ID')
  .agg(list).reset_index())

   ID                  Organizations
0   1  [education, health, facebook]
1   2        [health, Airlines, WHO]

CodePudding user response:

Another way is to explode, drop the values, then aggregate back with groupby:

df['Organizations'] = df['Organizations'].explode()\
   .loc[lambda x: x!='United States']]\
   .groupby(level=0).agg(list)
  • Related