I have two files. First one looks like this:
A | B | C | D | ... |
---|---|---|---|---|
0.002 | 0.25 | 1.25 | 2.45 | 3.54 |
0.15 | 0.45 | 2.35 | 0.48 | 4.57 |
... | ... | ... | ... | ... |
and the second one :
letter | ID |
---|---|
A | 12 |
T | 12 |
C | 23 |
D, J | 15 |
K | 45 |
M, A, C | 35 |
If one of column name of first file does not appear in the second file then remove it from second file, like this:
letter | ID |
---|---|
A | 12 |
C | 23 |
D | 15 |
A, C | 35 |
Thanks.
CodePudding user response:
You could use loc
to filter your df2 based on df1's columns:
>>> df2.loc[df2.letter.isin(list(df1))]
letter ID
0 A 12
2 C 23
list(df1)
will return df1's columns as a list which can be passed into isin
which is used on the 'letter' column in df2.
Given your edits, you can use the following approach:
df2_exp = df2.assign(g=df2.groupby('letter').ngroup(),
letter=df2.letter.str.split(',')
).explode('letter')
df2_exp.loc[
df2_exp.letter.str.strip().isin(list(df1)
)
].groupby('g').agg({'ID':'first','letter':lambda letter: ','.join(letter)
}).reset_index(drop=True)
prints:
ID letter
0 12 A
1 23 C
2 15 D
3 35 A, C