Home > other >  Calculate absolute difference for several columns together pandas
Calculate absolute difference for several columns together pandas

Time:08-18

I have a df like the one below (pivoted from long to wide), with the value within each 'firm' column representing sales.

date firm_a firm_b firm_c
2022-01-01 1000 1000 1000
2022-02-01 1500 500 2000
2022-03-01 2000 100 5000

I want to calculate the absolute difference in sales from the first value (2022-01-01) for each 'firm' column.

What I have done so far is to create a new column (adding the '_x' suffix) for each firm variable, where the absolute difference from the first value is calculated.

df['firm_a_x'] = df['firm_a'] - df['firm_a'].iat[0]
df['firm_b_x'] = df['firm_b'] - df['firm_b'].iat[0]
df['firm_c_x'] = df['firm_c'] - df['firm_c'].iat[0]

df = df[[col for col in df.columns if col.endswith('_x')]]

The code works as intended, providing the result below:

date firm_a_x firm_b_x firm_c_x
2022-01-01 0 0 0
2022-02-01 500 -500 1000
2022-03-01 1000 -900 4000

The problem is that while this code works for dataframes with small number of variables, it becomes inefficient with large number of columns, as I will have to write this

df['firm_name_x'] = df['firm_name'] - df['firm_name'].iat[0]

piece of code several times for each variable.

Therefore, my question is: Is there a way I can calculate the absolute difference across all the columns of a dataframe (except index/first column) in a simpler, faster, way?

Thank you!!

CodePudding user response:

If need processing all columns without date column subtract first row of DataFrame and rename columns by DataFrame.add_suffix:

df = df.set_index('date')
df = df.sub(df.iloc[0]).add_suffix('_x')
print (df)
            firm_a_x  firm_b_x  firm_c_x
date                                    
2022-01-01         0         0         0
2022-02-01       500      -500      1000
2022-03-01      1000      -900      4000

If there is multiple columns and need processing only firm columns:

cols = df.filter(like='firm').columns

df[cols] = df[cols].sub(df[cols].iloc[0])
df = df.rename(columns=dict(zip(cols, cols   '_x')))
print (df)
         date  firm_a_x  firm_b_x  firm_c_x
0  2022-01-01         0         0         0
1  2022-02-01       500      -500      1000
2  2022-03-01      1000      -900      4000
  • Related