Home > OS >  Is there a way to group this and compare?
Is there a way to group this and compare?

Time:04-07

Quick question about pandas.

How am I able to find the amount of times this list shows up in this DataFrame?

['A', 'A', 'A', 'A']

Line Generation SNP_1 SNP_2 SNP_3 SNP_3
B 2021 A A A A
A 2022 A C C G
A 2022 A C T C
A 2022 A C C G

The idea is: for each row, from column at position [2:] and onwards, does the list match the row, in that same order.

While grouping would be nice it's not neccessary, i.e. groupby(['Line', 'Generation']) and count how many times the list occurs yielding results: "1" for group: Line B/Generation 2021, "0" for Line A/Generation 2022.

CodePudding user response:

grp = df.set_index(["Line", "Generation"]).apply(tuple, axis=1).reset_index(drop=True)
print(df.groupby(["Line", "Generation", grp]).size())
# Line  Generation              
# A     2022        (A, C, C, G)    2
#                   (A, C, T, C)    1
# B     2021        (A, A, A, A)    1
# dtype: int64

CodePudding user response:

One option is to apply a lambda that compares row with list:

out = df.filter(like='SNP').apply(lambda x: x.tolist()==['A', 'A', 'A', 'A'], axis=1).sum()

Another is to use numpy and use its vectorized operations:

out = (df.filter(like='SNP').to_numpy() == np.array(['A', 'A', 'A', 'A'])[:, None]).all(axis=1).sum()

Output:

1  # there is only one row that matches the list in that order
  • Related