Calculating last and third value difference of each same column value in a dataframe in loops-CodePudding

I am trying to calculate the difference between the last and third value for specific month and year for each column (ex1, ex2,...) in my dataset and then save into new dataframe.

My dataset looks like this:

month	year	ex1	ex2
12	1995	55	55
12	1995	46	33
12	1995	33	12
12	1995	15	17
12	1995	6	16
12	1995	35	32
12	1995	67	22
12	1995	43	25
12	1995	31	26
12	1995	34	11
12	1995	53	14
12	1995	72	60
1	1996	34	90
1	1996	55	14
1	1996	58	24
1	1996	54	23
1	1996	33	20
1	1996	24	45
1	1996	23	33
1	1996	15	38
1	1996	11	50
1	1996	79	55
1	1996	80	71
1	1996	88	74

So for example, I want to calculate for ex1:

Last value of month 12 and year 1995 -> 72
Third value of month 12 and year 1995 -> 33
Difference: 72 - 33 = 39
Last value of month 1 and year 1996 -> 88
Third value of month 1 and year 1996 -> 58
Difference: 88 - 58 = 30

For ex2:

Last value of month 12 and year 1995 -> 88
Third value of month 12 and year 1995 -> 58
Difference: 88 - 58 = 30
Last value of month 1 and year 1996 -> 74
Third value of month 1 and year 1996 -> 24
Difference: 74 - 24 = 50

After that, I want to write these values a new dataframe as follow:

Name	Month	Year	Difference
ex1	12	1995	39
ex1	1	1996	30
ex2	12	1995	30
ex2	1	1996	50

I tried to do the following:

df = pd.read_excel ('dataset') #See above
df_new = pd.DataFrame(columns=["Name","Month","Year","Difference"])

for colname, colitems in df.iloc.[:,2:].iteritems():
    for rownum, rowitem in colitems.iteritems():

        if (df["month"][rownum] != df["month"][rownum 1]) & (df["year"][rownum] != df["year"][rownum 1]):
            last= df[colname][rownum]
            third= df[colname][2]
            diff = last - third
            df_new.assign(name = df[colname][rownum], month = df["month"][rownum], 
                                    year= df["year"][rownum], difference= diff)

When I try to run this, I had the KeyError: 79 in the if clause.

Also, my code calculates the difference only for month 12 and year 1995 for ex1. How can I move on to next month and year and the next value: ex2. Am I in the right direction? Do you have any suggestions that would help my problem?

If there's anything I've left out that would be helpful to solving this, please let me know so I can rectify it immediately.

CodePudding user response：

The basic operation you are looking for is a one-liner:

res = df.groupby(['month','year']).last() - df.groupby(['month','year']).nth(2)
print(res)

            ex1  ex2
month year          
1     1996   30   50
12    1995   39   48

To get this into the shape of your desired output, try as follows:

res.reset_index(drop=False,inplace=True)
res.sort_values(['year','month'], inplace=True)

res = pd.melt(res, 
              id_vars=['month','year'], 
              value_vars=['ex1','ex2'], 
              var_name = 'name', 
              value_name='difference')

res = res.loc[:, ['name','month','year','difference']]
print(res)

  name  month  year  difference
0  ex1     12  1995          39
1  ex1      1  1996          30
2  ex2     12  1995          48
3  ex2      1  1996          50

CodePudding user response：

grp = df.groupby(['Month', 'Year'])

Then the following will return the differences you desire,

grp.last() - grp.nth(2)

No looping required :)

EDIT: here is the synonymous example I tested -- the answer is not to the requested shape, but the data is good

df = pd.DataFrame(
    {
        "month": [1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2],
        "year": [1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000],
        "ex1": [11, 22, 33, 44, 55, 66, 77, 88, 22, 44, 66, 88, 1010, 1212, 1414, 1616],
        "ex2": [22, 44, 66, 88, 1010, 1212, 1414, 1616, 11, 22, 33, 44, 55, 66, 77, 88],
    }
)

Returns the following:

>>>         
            ex1  ex2
month year
1     1999   11   22
      2000   22   11
2     1999   11  202
      2000  202   11