Python pandas - grouping and plotting-CodePudding

In df1 I have a column for Line, Generation, ID, and Sex.

What I want to do is find out how many there are, in df1, of a certain combination. These combinations, occurring in a specific Line and Generation, are able to be found using another df (df2). Obviously, I've set it up so that this second df has the combinations I'm looking for. The results for this will be like, line A, Generation 2020A has a total of 1 row for row ['A', 'A', 'A', 'A'] in df2. Line B, Generation 2020B, has a total of two rows for row ['A', 'C', 'T', 'G'] in df2.

df1

Line	ID	Sex	Generation	SNP-1	SNP-2	SNP-3	SNP-4
A	1	F	2020A	A	A	A	A
B	2	F	2020B	A	C	T	G
B	3	F	2020B	A	C	T	G

df2

SNP-1	SNP-2	SNP-3	SNP-4
A	A	A	A
A	C	T	G

CodePudding user response：

You can use merge and then do value_counts to achieve this.

import pandas as pd    
df1 = pd.DataFrame([['A','2020A',   'A',    'A',    'A',    'A'], ['B','2020B', 'A',    'C',    'T',    'G'],['B','2020B',  'A',    'C',    'T',    'G']], columns= ['Line','Generation','SNP-1',   'SNP-2',    'SNP-3',    'SNP-4'])
df2 = pd.DataFrame([['A',   'A',    'A',    'A'],['A',  'C',    'T',    'G']], columns=['SNP-1',    'SNP-2',    'SNP-3',    'SNP-4'])

df_merge = df1.merge(df2, on=['SNP-1',  'SNP-2',    'SNP-3',    'SNP-4'])
print(df_merge)

print('\n', df_merge.value_counts(['Line', 'Generation']))

Output:

  Line Generation SNP-1 SNP-2 SNP-3 SNP-4
0    A      2020A     A     A     A     A
1    B      2020B     A     C     T     G
2    B      2020B     A     C     T     G

 Line  Generation
B     2020B         2
A     2020A         1
dtype: int64