Home > other >  Calculating last and third value difference of each same column value in a dataframe in loops
Calculating last and third value difference of each same column value in a dataframe in loops

Time:08-09

I am trying to calculate the difference between the last and third value for specific month and year for each column (ex1, ex2,...) in my dataset and then save into new dataframe.

My dataset looks like this:

month year ex1 ex2
12 1995 55 55
12 1995 46 33
12 1995 33 12
12 1995 15 17
12 1995 6 16
12 1995 35 32
12 1995 67 22
12 1995 43 25
12 1995 31 26
12 1995 34 11
12 1995 53 14
12 1995 72 60
1 1996 34 90
1 1996 55 14
1 1996 58 24
1 1996 54 23
1 1996 33 20
1 1996 24 45
1 1996 23 33
1 1996 15 38
1 1996 11 50
1 1996 79 55
1 1996 80 71
1 1996 88 74

So for example, I want to calculate for ex1:

  • Last value of month 12 and year 1995 -> 72

  • Third value of month 12 and year 1995 -> 33

  • Difference: 72 - 33 = 39

  • Last value of month 1 and year 1996 -> 88

  • Third value of month 1 and year 1996 -> 58

  • Difference: 88 - 58 = 30

For ex2:

  • Last value of month 12 and year 1995 -> 88

  • Third value of month 12 and year 1995 -> 58

  • Difference: 88 - 58 = 30

  • Last value of month 1 and year 1996 -> 74

  • Third value of month 1 and year 1996 -> 24

  • Difference: 74 - 24 = 50

After that, I want to write these values a new dataframe as follow:

Name Month Year Difference
ex1 12 1995 39
ex1 1 1996 30
ex2 12 1995 30
ex2 1 1996 50

I tried to do the following:

df = pd.read_excel ('dataset') #See above
df_new = pd.DataFrame(columns=["Name","Month","Year","Difference"])

for colname, colitems in df.iloc.[:,2:].iteritems():
    for rownum, rowitem in colitems.iteritems():

        if (df["month"][rownum] != df["month"][rownum 1]) & (df["year"][rownum] != df["year"][rownum 1]):
            last= df[colname][rownum]
            third= df[colname][2]
            diff = last - third
            df_new.assign(name = df[colname][rownum], month = df["month"][rownum], 
                                    year= df["year"][rownum], difference= diff)

When I try to run this, I had the KeyError: 79 in the if clause.

Also, my code calculates the difference only for month 12 and year 1995 for ex1. How can I move on to next month and year and the next value: ex2. Am I in the right direction? Do you have any suggestions that would help my problem?

If there's anything I've left out that would be helpful to solving this, please let me know so I can rectify it immediately.

CodePudding user response:

The basic operation you are looking for is a one-liner:

res = df.groupby(['month','year']).last() - df.groupby(['month','year']).nth(2)
print(res)

            ex1  ex2
month year          
1     1996   30   50
12    1995   39   48

To get this into the shape of your desired output, try as follows:

res.reset_index(drop=False,inplace=True)
res.sort_values(['year','month'], inplace=True)

res = pd.melt(res, 
              id_vars=['month','year'], 
              value_vars=['ex1','ex2'], 
              var_name = 'name', 
              value_name='difference')

res = res.loc[:, ['name','month','year','difference']]
print(res)

  name  month  year  difference
0  ex1     12  1995          39
1  ex1      1  1996          30
2  ex2     12  1995          48
3  ex2      1  1996          50

CodePudding user response:

grp = df.groupby(['Month', 'Year'])

Then the following will return the differences you desire,

grp.last() - grp.nth(2)

No looping required :)

EDIT: here is the synonymous example I tested -- the answer is not to the requested shape, but the data is good

df = pd.DataFrame(
    {
        "month": [1, 1, 1, 1, 2, 2, 2, 2, 1, 1, 1, 1, 2, 2, 2, 2],
        "year": [1999, 1999, 1999, 1999, 1999, 1999, 1999, 1999, 2000, 2000, 2000, 2000, 2000, 2000, 2000, 2000],
        "ex1": [11, 22, 33, 44, 55, 66, 77, 88, 22, 44, 66, 88, 1010, 1212, 1414, 1616],
        "ex2": [22, 44, 66, 88, 1010, 1212, 1414, 1616, 11, 22, 33, 44, 55, 66, 77, 88],
    }
)

Returns the following:

>>>         
            ex1  ex2
month year
1     1999   11   22
      2000   22   11
2     1999   11  202
      2000  202   11
  • Related