Home > Mobile >  If string is name of one of column, take rows including this string, if not remove row with python
If string is name of one of column, take rows including this string, if not remove row with python

Time:09-15

I have two files. First one looks like this:

A B C D ...
0.002 0.25 1.25 2.45 3.54
0.15 0.45 2.35 0.48 4.57
... ... ... ... ...

and the second one :

letter ID
A 12
T 12
C 23
D, J 15
K 45
M, A, C 35

If one of column name of first file does not appear in the second file then remove it from second file, like this:

letter ID
A 12
C 23
D 15
A, C 35

Thanks.

CodePudding user response:

You could use loc to filter your df2 based on df1's columns:

>>> df2.loc[df2.letter.isin(list(df1))]

  letter  ID
0      A  12
2      C  23

list(df1) will return df1's columns as a list which can be passed into isin which is used on the 'letter' column in df2.


Given your edits, you can use the following approach:

df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
           letter=df2.letter.str.split(',')
           ).explode('letter')

df2_exp.loc[
    df2_exp.letter.str.strip().isin(list(df1)
                                    )
    ].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
                        }).reset_index(drop=True)

prints:

   ID letter
0  12      A
1  23      C
2  15      D
3  35   A, C
  • Related