I want to filter a pandas data-frame. Obviously I can do that with a few loops but will not be efficient for huge amounts of data, so I wanted to ask for some idea on how to do it.
I have data organised for example as,
c1=t c2=detector(1,2,or3) c3
1 1
1 2
2 1
2 2
2 3
3 2
3 3
4 1
4 3
...
It can be seen as times of a simulation, c1=t
, and c2
is the type of detector that is simulated. And for some times you do not have data of one of the detectors randomly, so I want to divide the data in dataframes each for same-detectors times.
The first column is a index increasing column, and the second column can have only 3 values (1,2,3), I want to separate this dataframe in 3,
- One that contains all rows where for a same
c1
where you have 1, and 2 values - One that contains all rows where for a same
c1
where you have 1, and 3 values - One that contains all rows where for a same
c1
where you have 2, and 3 values - The last that contain all three values 1,2,3
There are no unique c1
rows.
i want to know how will be the pythonic way to filter this dataframe.
CodePudding user response:
You can aggregate set
s, so ordering of values in c2
is not important:
s = df.groupby('c1')['c2'].agg(set)
print (s)
c1
1 {1, 2}
2 {1, 2, 3}
3 {2, 3}
4 {1, 3}
Name: c2, dtype: object
Then filter c1
by matching sets:
df1 = df[df['c1'].isin(s.index[s.eq(set([1,2]))])]
df2 = df[df['c1'].isin(s.index[s.eq(set([1,3]))])]
df3 = df[df['c1'].isin(s.index[s.eq(set([2,3]))])]
df4 = df[df['c1'].isin(s.index[s.eq(set([1,2,3]))])]
Another idea is create dictionary of DataFrames:
s = df.groupby('c1')['c2'].agg(set)
print (s)
c1
1 {1, 2}
2 {1, 2, 3}
3 {2, 3}
4 {1, 3}
Name: c2, dtype: object
vals = [(1,2),(1,3),(2,3),(1,2,3)]
dfs = {'_'.join(map(str, x)): df[df['c1'].isin(s.index[s.eq(set(x))])] for x in vals}
print (dfs)
{'1_2': c1 c2
0 1 1
1 1 2, '1_3': c1 c2
7 4 1
8 4 3, '2_3': c1 c2
5 3 2
6 3 3, '1_2_3': c1 c2
2 2 1
3 2 2
4 2 3}
print (dfs['1_2'])
print (dfs['1_3'])
print (dfs['2_3'])
print (dfs['1_2_3'])