Home > Back-end >  Pandas Cumsum by Same Name Columns Not Working
Pandas Cumsum by Same Name Columns Not Working

Time:09-17

I have an unusual task in that I need to calculate a cumsum answer by column with the same column name and then by rows in months (1-12). I think that there is a problem in python with using cumsum on non-unique column names. For example, my data looks like this in DataFrame df:

month   Adj_Prod            Adj_Prod            Adj_Prod
1       441215.6939045699   441215.6939045699   441215.6939045699
2       519150.654291369    519150.654291369    519150.654291369
3       479624.188336129    479624.188336129    479624.188336129
4       444504.2012028889   444504.2012028889   444504.2012028889
5       327232.2926583468   327232.2926583468   327232.2926583468
6       270260.9249640972   270260.9249640972   270260.9249640972
7       245660.8295624059   245660.8295624059   245660.8295624059
8       174337.2269775      174337.2269775      174337.2269775
9       249508.2308332307   281902.4467308491   240007.62178783625
10      335156.9175691739   409076.49866234337  370223.7513004631
11      410662.9605211421   413049.84287055075  454906.31546153163
12      443616.23516782525  445685.3862165406   446864.45394745696

I need my final answer to look like this:

month   Adj_Prod    Adj_Prod    Adj_Prod
1       441216      441216      441216
2       960366      960366      960366
3      1439991     1439991     1439991
4      1884495     1884495     1884495
5      2211727     2211727     2211727
6      2481988     2481988     2481988
7      2727649     2727649     2727649
8      2901986     2901986     2901986
9      3151494     3183888     3141994
10     3486651     3592965     3512217
11     3897314     4006015     3967124
12     4340930     4451700     4413988

This should work but the cumsum does nothing - it just gives me the exact same data in the DataFrame df above. Here is my most basic attempt:

df.groupby('month')['Adj_Prod'].cumsum()

CodePudding user response:

Although it's highly encourage to not have the dataframe with same column names, you can use df.apply

df['Adj_Prod'].apply(lambda x: x.cumsum())

        Adj_Prod      Adj_Prod      Adj_Prod
0   4.412157e 05  4.412157e 05  4.412157e 05
1   9.603663e 05  9.603663e 05  9.603663e 05
2   1.439991e 06  1.439991e 06  1.439991e 06
3   1.884495e 06  1.884495e 06  1.884495e 06
4   2.211727e 06  2.211727e 06  2.211727e 06
5   2.481988e 06  2.481988e 06  2.481988e 06
6   2.656325e 06  2.656325e 06  2.656325e 06
7   2.896333e 06  2.896333e 06  2.896333e 06
8   3.266557e 06  3.266557e 06  3.266557e 06
9   3.721463e 06  3.721463e 06  3.721463e 06
10  4.168327e 06  4.168327e 06  4.168327e 06
  • Related