Home > Enterprise >  filtering data in the same column pandas
filtering data in the same column pandas

Time:09-30

I have a table which looks like this: image is in the link and I want to delete rows that have both 'Pfam' and 'SMART' analysis under the same protein accession code. At the same time, I want to save entries that contain only 'Pfam' analysis without 'SMART'. I've wrote a bit of code but unfortunately, it doesn't work.

if (df_filtered['analysis']=='Pfam')&(df_filtered['analysis']=='SMART'):
    df_filtered.drop(index=df_filtered[df_filtered['analysis']=='Pfam'].index)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() 
or a.all().

Could someone help me, please?

CodePudding user response:

IIUC: Let's say we have the following dataframe:

>>> df = pd.DataFrame({'group': list('AABCDD'), 'analysis': ['SMART', 'Pfam', 'SMART', 'Pfam', 'SMART', 'Pfam']})
>>> df
  group analysis
0     A    SMART
1     A     Pfam
2     B    SMART
3     C     Pfam
4     D    SMART
5     D     Pfam

You only want to remove the rows with analysis 'SMART' and within the same group analysis 'Pfam'. So only row 0 and 4 are removed here:

df['nunique'] = df.groupby('group').analysis.transform('nunique')
df[~((df['analysis'] == 'SMART') & (df['nunique'] > 1))]

Output:

  group analysis  nunique
1     A     Pfam        2
2     B    SMART        1
3     C     Pfam        1
5     D     Pfam        2
  • Related