I have a dataframe which has two columns, 'Group' and 'Sample Number' The column 'Group' has sample number '11' which is UNIQUE. and each group will have only one '11' Sample Number, followed by the sample numbers in range of 21 to 29 ( for example, 21, 22 23, 24, 25, 26, 27 , 28 , 29) and followed by the sample numbers in range of 31 to 39 (for example, 31, 32, 33, 34, 35, 36, 37, 38, 39). Hence each group should have one '11' sample number, at least one sample number in the range of 21 to 29 and at least one sample number in the rande of 31 to 39.
I wish to compute in such a way that my code goes through each group and
Check if there is a sample number 11 in the group or not.
Check if there is at least one sample number in the range of 21 to 29 .
Check if there is at least one sample number in the range of 31 to 39
If any of these three conditions does not match then the code removes the entire group from the dataframe
Below is the dataframe in table format:
Group | Sample_Number |
---|---|
Z007 | 11 |
Z007 | 21 |
Z007 | 22 |
Z007 | 23 |
Z007 | 31 |
Z007 | 32 |
Z008 | 11 |
Z008 | 31 |
Z008 | 32 |
Z008 | 33 |
Z009 | 11 |
Z009 | 21 |
Z009 | 22 |
Z009 | 23 |
Z010 | 21 |
Z010 | 22 |
Z010 | 23 |
Z010 | 24 |
Z010 | 31 |
Z010 | 32 |
Z010 | 33 |
Z010 | 34 |
df = pd.DataFrame([[Z007, 11],[Z007, 21] , [Z007, 22], [Z007, 23], [Z007, 31],[Z007, 32],[Z008, 11],[Z008, 31],[Z008, 32],[Z008, 33],[Z009, 11],[Z009, 21],[Z009, 22],[Z009, 23], [Z010, 21],[Z010, 22],[Z010, 23], [Z010, 24],[Z010, 31],[Z010, 32],[Z010, 33],[Z010, 34], columns=['Group', 'Sample_Number'])
The code should remove the group 'Z008' as it does not have the sample number in the range of 21 to 29. It should remove the group 'Z009' as it does not have the sample number in the range of 31 to 39. Also it should remove the group 'Z010' as it does not have the sample number '11'.
Expected answer is below:
Group | Sample_Number |
---|---|
Z007 | 11 |
Z007 | 21 |
Z007 | 22 |
Z007 | 23 |
Z007 | 31 |
Z007 | 32 |
I could do it only for sample number 11 but struggling to do the same for the other sample numbers in the range of (21 to 29 ) and (31 to 39), below is the code for sample number 11
invalid_group_no = [i for i in df['Group'].unique() if
df[df['Group']== i]["Sample_Number"].to_list().count(11)!=1]
Can anyone please help me with the other sample numbers? Please feel free to implement your own ways. Any help is appreciated.
CodePudding user response:
Try this:
groups = set(df['Group'][df['Sample_Number'] == 11]) & set(df['Group'][df['Sample_Number'].isin(range(21,30))]) & set(df['Group'][df['Sample_Number'].isin(range(31,40))])
df = df[df['Group'].isin(groups)]
Group Sample_Number
0 Z007 11
1 Z007 21
2 Z007 22
3 Z007 23
4 Z007 31
5 Z007 32