In df1
I have a column for Line, Generation, ID, and Sex.
What I want to do is find out how many there are, in df1
, of a certain combination. These combinations, occurring in a specific Line and Generation, are able to be found using another df (df2). Obviously, I've set it up so that this second df has the combinations I'm looking for. The results for this will be like, line A, Generation 2020A has a total of 1 row for row ['A', 'A', 'A', 'A'] in df2
. Line B, Generation 2020B, has a total of two rows for row ['A', 'C', 'T', 'G'] in df2
.
df1
Line | ID | Sex | Generation | SNP-1 | SNP-2 | SNP-3 | SNP-4 |
---|---|---|---|---|---|---|---|
A | 1 | F | 2020A | A | A | A | A |
B | 2 | F | 2020B | A | C | T | G |
B | 3 | F | 2020B | A | C | T | G |
df2
SNP-1 | SNP-2 | SNP-3 | SNP-4 |
---|---|---|---|
A | A | A | A |
A | C | T | G |
CodePudding user response:
You can use merge
and then do value_counts
to achieve this.
import pandas as pd
df1 = pd.DataFrame([['A','2020A', 'A', 'A', 'A', 'A'], ['B','2020B', 'A', 'C', 'T', 'G'],['B','2020B', 'A', 'C', 'T', 'G']], columns= ['Line','Generation','SNP-1', 'SNP-2', 'SNP-3', 'SNP-4'])
df2 = pd.DataFrame([['A', 'A', 'A', 'A'],['A', 'C', 'T', 'G']], columns=['SNP-1', 'SNP-2', 'SNP-3', 'SNP-4'])
df_merge = df1.merge(df2, on=['SNP-1', 'SNP-2', 'SNP-3', 'SNP-4'])
print(df_merge)
print('\n', df_merge.value_counts(['Line', 'Generation']))
Output:
Line Generation SNP-1 SNP-2 SNP-3 SNP-4
0 A 2020A A A A A
1 B 2020B A C T G
2 B 2020B A C T G
Line Generation
B 2020B 2
A 2020A 1
dtype: int64