Home > Blockchain >  Conditional assignment in pandas python based on consecutive years values
Conditional assignment in pandas python based on consecutive years values

Time:02-16

I have a dataset that looks like

 A    B    year   CompanyName   
 1    nan  1999      tesla
 4     3   2000      tesla
 Nan  nan  2001      tesla
 2    nan  2002      tesla 
 3    nan  1999      BMW
 2    -1   2000      BMW
 2    nan  2002      BMW

So I have different company data and I want to calculate a new column B which is equal to the difference between the current row and the previous one of column A following these conditions:

  1. the difference is done only if both consecutive years have valid values, if one year is missing then the difference should not be calculated
  2. if one of the company year value of A is missing then the difference column will assume nan value
  3. the difference has to be done for the same company, therefore i don't want the difference to be done for different companies

I wrote this code:

 finalData[~(finalData.year.diff(1).isnull()) & (finalData.year.diff(1) ==1) & 
 (finalData.companyName)]

but I am stuck with the companyName condition, I appreciate your time and help

Ps. I order my data based on CompanyName then year to make this strange difference

CodePudding user response:

Assuming that your data is sorted by year (if not use df = df.sort_values(by='year')), you can do:

g = df.groupby('CompanyName')
df['B'] = g['A'].diff().where(df['year'].eq(g['year'].shift() 1))

output:

     A    B  year CompanyName
0  1.0  NaN  1999       tesla
1  4.0  3.0  2000       tesla
2  NaN  NaN  2001       tesla
3  2.0  NaN  2002       tesla
4  3.0  NaN  1999         BMW
5  2.0 -1.0  2000         BMW
6  2.0  NaN  2002         BMW
  • Related