Home > Software engineering >  Pandas select a specific column from each row
Pandas select a specific column from each row

Time:03-21

first of all: thank you for all the questions and answers. So far, I always found a solution to my problems here. However, with the following problem I'm stuck:

I have a dataframe as this:

     Jan_x  Feb_x  Mar_x  Apr_x  ...  driest  driest_rr        DMAI  Station_id
0     -433   -398    -18    508  ...   Mar_x       2684   37.189000           2
1      -95   -102    164    631  ...   Mar_x       2732   30.568445          10
2       59    272    691   1165  ...   Jan_x       1970   40.237462          12
3       30    239    696   1108  ...   Feb_x       3548   43.941148          13
4    -1128  -1193   -985   -667  ...   Feb_x      12715  334.828246          15

(995 rows in total)

The first 12 columns are monthly mean temperature values (in 0.01 degrees), the last column ('Station_id') is an identifier for climate stations. From another dataframe containing precipitation data I got the driest month ('driest') and it's precipitation amount ('driest_rr'; in 0.01 mm). Finally, 'DMAI' is an annual aridity index already calculated in the step before. Now I want to compute another Aridity Index (for meteorologists/climate scientists: the Pinna Combinative Index) that includes both annual mean temperature and precipitation (already included in 'DMAI') and mean temperature and precipitation of the driest month. The equation is:

DMAI = P/(T 10)

PCI = 0.5 (DMAI (12Pd/Td 10))

with P,T annual mean temperature and precipitation and Pd,Td mean temperature and precipitation of the driest month (in mm and °C respectively)

I already have:

df['PCI']      = 0.5 * (df.loc[:,'DMAI']  (12*(df.loc[:,'driest_rr']/100)))/(df.loc[:,'Mar_x'] 10))

which works. However, the driest month is not always March, I need the one specified in the column 'driest'.

df['PCI']      = 0.5 * (df.loc[:,'DMAI']  (12*(df.loc[:,'driest_rr']/100)))/(df.loc[:,df_dmai.loc[:,'driest']] 10))

does not work however. Is there a way to solve this? I found a few similar question, like this one here: How can I select a specific column from each row in a Pandas DataFrame? However, the answers that I found use either the deprecated df.lookup() or a numpy workaround, so they don't help me in this case.

CodePudding user response:

You can use loc method or iloc you can find these methods by adding a . after your dataframe name then click tab

CodePudding user response:

pandas has a lot of numpy behind it, and so the workaround from the pandas docs is very easy to plug right back into your DataFrame:

In [27]: df = pd.DataFrame({'select': ['a', 'b', 'c', 'b', 'c', 'a'], 'a': range(6), 'b': range(6, 12), 'c': range(12, 18)})

In [28]: idx, cols = pd.factorize(df['select'])

In [29]: df['chosen'] = df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

In [30]: df
Out[30]:
  select  a   b   c  chosen
0      a  0   6  12       0
1      b  1   7  13       7
2      c  2   8  14      14
3      b  3   9  15       9
4      c  4  10  16      16
5      a  5  11  17       5
  • Related