Home > database >  Calculate duration in years for multiple variables in a column
Calculate duration in years for multiple variables in a column

Time:02-05

I am trying to calculate how long each store has been open in years. Here is an example of the dataset:

year store name
2000 Store A
2001 Store A
2002 Store A
2003 Store A
2000 Store B
2001 Store B
2002 Store B
2000 Store C

I'm not sure how to calculate the difference in max and min year for each store name as they are all in the same column. Do I put it into a new column using pandas?

CodePudding user response:

You need to use a groupby:

g = df.groupby('store name')['year']

out = g.max()-g.min()

CodePudding user response:

You can use groupby and transform to create an additional column in the same dataframe.

df["years open"] = df.groupby("store name")["year"].transform(lambda x: x.max()-x.min())
  • Related