first of all: thank you for all the questions and answers. So far, I always found a solution to my problems here. However, with the following problem I'm stuck:
I have a dataframe as this:
Jan_x Feb_x Mar_x Apr_x ... driest driest_rr DMAI Station_id
0 -433 -398 -18 508 ... Mar_x 2684 37.189000 2
1 -95 -102 164 631 ... Mar_x 2732 30.568445 10
2 59 272 691 1165 ... Jan_x 1970 40.237462 12
3 30 239 696 1108 ... Feb_x 3548 43.941148 13
4 -1128 -1193 -985 -667 ... Feb_x 12715 334.828246 15
(995 rows in total)
The first 12 columns are monthly mean temperature values (in 0.01 degrees), the last column ('Station_id') is an identifier for climate stations. From another dataframe containing precipitation data I got the driest month ('driest') and it's precipitation amount ('driest_rr'; in 0.01 mm). Finally, 'DMAI' is an annual aridity index already calculated in the step before. Now I want to compute another Aridity Index (for meteorologists/climate scientists: the Pinna Combinative Index) that includes both annual mean temperature and precipitation (already included in 'DMAI') and mean temperature and precipitation of the driest month. The equation is:
DMAI = P/(T 10)
PCI = 0.5 (DMAI (12Pd/Td 10))
with P,T annual mean temperature and precipitation and Pd,Td mean temperature and precipitation of the driest month (in mm and °C respectively)
I already have:
df['PCI'] = 0.5 * (df.loc[:,'DMAI'] (12*(df.loc[:,'driest_rr']/100)))/(df.loc[:,'Mar_x'] 10))
which works. However, the driest month is not always March, I need the one specified in the column 'driest'.
df['PCI'] = 0.5 * (df.loc[:,'DMAI'] (12*(df.loc[:,'driest_rr']/100)))/(df.loc[:,df_dmai.loc[:,'driest']] 10))
does not work however.
Is there a way to solve this?
I found a few similar question, like this one here:
How can I select a specific column from each row in a Pandas DataFrame?
However, the answers that I found use either the deprecated df.lookup()
or a numpy workaround, so they don't help me in this case.
CodePudding user response:
You can use loc method or iloc you can find these methods by adding a . after your dataframe name then click tab
CodePudding user response:
pandas
has a lot of numpy
behind it, and so the workaround from the pandas docs is very easy to plug right back into your DataFrame:
In [27]: df = pd.DataFrame({'select': ['a', 'b', 'c', 'b', 'c', 'a'], 'a': range(6), 'b': range(6, 12), 'c': range(12, 18)})
In [28]: idx, cols = pd.factorize(df['select'])
In [29]: df['chosen'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]
In [30]: df
Out[30]:
select a b c chosen
0 a 0 6 12 0
1 b 1 7 13 7
2 c 2 8 14 14
3 b 3 9 15 9
4 c 4 10 16 16
5 a 5 11 17 5