I am trying to calculate how long each store has been open in years. Here is an example of the dataset:
year | store name |
---|---|
2000 | Store A |
2001 | Store A |
2002 | Store A |
2003 | Store A |
2000 | Store B |
2001 | Store B |
2002 | Store B |
2000 | Store C |
I'm not sure how to calculate the difference in max and min year for each store name as they are all in the same column. Do I put it into a new column using pandas?
CodePudding user response:
You need to use a groupby
:
g = df.groupby('store name')['year']
out = g.max()-g.min()
CodePudding user response:
You can use groupby
and transform
to create an additional column in the same dataframe.
df["years open"] = df.groupby("store name")["year"].transform(lambda x: x.max()-x.min())