I have a df like the one below (pivoted from long to wide), with the value within each 'firm' column representing sales.
date | firm_a | firm_b | firm_c |
---|---|---|---|
2022-01-01 | 1000 | 1000 | 1000 |
2022-02-01 | 1500 | 500 | 2000 |
2022-03-01 | 2000 | 100 | 5000 |
I want to calculate the absolute difference in sales from the first value (2022-01-01) for each 'firm' column.
What I have done so far is to create a new column (adding the '_x' suffix) for each firm variable, where the absolute difference from the first value is calculated.
df['firm_a_x'] = df['firm_a'] - df['firm_a'].iat[0]
df['firm_b_x'] = df['firm_b'] - df['firm_b'].iat[0]
df['firm_c_x'] = df['firm_c'] - df['firm_c'].iat[0]
df = df[[col for col in df.columns if col.endswith('_x')]]
The code works as intended, providing the result below:
date | firm_a_x | firm_b_x | firm_c_x |
---|---|---|---|
2022-01-01 | 0 | 0 | 0 |
2022-02-01 | 500 | -500 | 1000 |
2022-03-01 | 1000 | -900 | 4000 |
The problem is that while this code works for dataframes with small number of variables, it becomes inefficient with large number of columns, as I will have to write this
df['firm_name_x'] = df['firm_name'] - df['firm_name'].iat[0]
piece of code several times for each variable.
Therefore, my question is: Is there a way I can calculate the absolute difference across all the columns of a dataframe (except index/first column) in a simpler, faster, way?
Thank you!!
CodePudding user response:
If need processing all columns without date
column subtract first row of DataFrame and rename columns by DataFrame.add_suffix
:
df = df.set_index('date')
df = df.sub(df.iloc[0]).add_suffix('_x')
print (df)
firm_a_x firm_b_x firm_c_x
date
2022-01-01 0 0 0
2022-02-01 500 -500 1000
2022-03-01 1000 -900 4000
If there is multiple columns and need processing only firm
columns:
cols = df.filter(like='firm').columns
df[cols] = df[cols].sub(df[cols].iloc[0])
df = df.rename(columns=dict(zip(cols, cols '_x')))
print (df)
date firm_a_x firm_b_x firm_c_x
0 2022-01-01 0 0 0
1 2022-02-01 500 -500 1000
2 2022-03-01 1000 -900 4000