I need to access and extract information from a Dataframe that is used for other colleagues in a research group.
The DataFrame structure is:
zee.loc[zee['layer']=='EMB2'].loc[zee['roi']==0]
e et eta phi deta dphi samp hash det layer roi eventNumber
2249 20.677443 20.675829 0.0125 -1.067651 0.025 0.024544 3 2030015444 2 EMB2 0 2
2250 21.635288 21.633598 0.0125 -1.043107 0.025 0.024544 3 2030015445 2 EMB2 0 2
2251 -29.408310 -29.406013 0.0125 -1.018563 0.025 0.024544 3 2030015446 2 EMB2 0 2
2252 43.127533 43.124165 0.0125 -0.994020 0.025 0.024544 3 2030015447 2 EMB2 0 2
2253 -3.025344 -3.025108 0.0125 -0.969476 0.025 0.024544 3 2030015448 2 EMB2 0 2
... ... ... ... ... ... ... ... ... ... ... ... ...
4968988 -5.825550 -5.309279 0.4375 -0.454058 0.025 0.024544 3 2030019821 2 EMB2 0 3955
4968989 39.750645 36.227871 0.4375 -0.429515 0.025 0.024544 3 2030019822 2 EMB2 0 3955
4968990 80.568573 73.428436 0.4375 -0.404971 0.025 0.024544 3 2030019823 2 EMB2 0 3955
4968991 -28.921751 -26.358652 0.4375 -0.380427 0.025 0.024544 3 2030019824 2 EMB2 0 3955
4968992 55.599472 50.672146 0.4375 -0.355884 0.025 0.024544 3 2030019825 2 EMB2 0 3955
So, I need to work only with the layer: EMB2 and the columns: et, eta, phi. To pick up these columns, I'm using the following code:
EtEtaPhi, EventLens = [], []
events = set(zee.loc[zee['layer']=='EMB2']['eventNumber'].to_numpy())
roi = set(zee.loc[zee['layer']=='EMB2']['roi'].to_numpy())
for ee in events:
for rr in roi:
if len(zee.loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr])==0: break
EtEtaPhi.append(zee[['et','eta','phi']].loc[zee['layer']=='EMB2'].loc[zee['eventNumber']==ee].loc[zee['roi']==rr].to_numpy())
EventLens.append(len(EtEtaPhi[-1]))
But to read 4000 events take so long time, almost one second per event. This result isn't good, almost one hour just to extract those columns!
Is there some way to extract columns from a DataFrame more efficiently and faster?
CodePudding user response:
The code
zee[['et','eta','phi']].loc[zee['layer']=='EMB2']
which you already have somewhere in there should do what you asked for. The rest is not needed.
CodePudding user response:
Just use .loc
:
sample = zee.loc[zee["layer"].eq("EMB2"), ["et","eta","phi"]]