I have a DataFrame which is like that:
date | Col.A | Col.B | Col.C | ... |
---|---|---|---|---|
2022 | 1 | 1 | 1 | ... |
2023 | 2 | 2 | 2 | ... |
2024 | 3 | 3 | 3 | ... |
And I want to add Columns where the Diff is calculated and looks like that afterwards:
date | Col.A | Diff Col.A | Col.B | Diff Col.B | Col.C | Diff Col.C | ... |
---|---|---|---|---|---|---|---|
2022 | 1 | nan | 1 | nan | 1 | nan | ... |
2023 | 2 | 1 | 2 | 1 | 2 | 1 | ... |
2024 | 3 | 1 | 3 | 1 | 3 | 1 | ... |
I tried it like that, but it doesnt work
df = pd.read_excel('example.xlsx', header=0).set_index(['date])
df.diff()
How can I do that in Pandas
CodePudding user response:
I rename new columns names by DataFrame.add_suffix
for correct sorting without specify all columns names:
df = pd.read_excel('example.xlsx', header=0, index_col=['date'])
df = pd.concat([df, df.diff().add_suffix(' diff')], axis=1).sort_index(axis=1)
print (df)
Col.A Col.A diff Col.B Col.B diff Col.C Col.C diff
date
2022 1 NaN 1 NaN 1 NaN
2023 2 1.0 2 1.0 2 1.0
2024 3 1.0 3 1.0 3 1.0
CodePudding user response:
df = pd.concat([df,df.diff().add_prefix('Diff ')], axis=1)
df = df.reindex(columns=sorted(df.columns, key=lambda x: (x.split(' ')[-1], x.split(' ')[0])))
df
###
date Col.A Diff Col.A Col.B Diff Col.B Col.C Diff Col.C
0 2022 1 NaN 1 NaN 1 NaN
1 2023 2 1.0 2 1.0 2 1.0
2 2024 3 1.0 3 1.0 3 1.0