Home > Software engineering >  Pandas: Calculate Difference between a row and all other rows and create column with the name
Pandas: Calculate Difference between a row and all other rows and create column with the name

Time:05-06

We have data as below

   Name    value1 Value2 finallist
0  cosmos  10     20      [10,20]
1  network 30     40      [30,40]
2  unab    20     40      [20,40]

is there any way to do difference between all the rows

Something final output like

   Name    value1 Value2 finallist  cosmos  network  unab
0  cosmos  10     20      [10,20]    0      40       30
1  network 30     40      [30,40]    40     0        10
2  unab    20     40      [20,40]    30     10       0

Data has different types of names and each name should be a column

CodePudding user response:

You want the pairwise absolute difference of the sum of the values for each row. The easiest might be to use the underlying numpy array.

absolute difference of the sum of the "value" columns

# get sum of values per row and convert to numpy array
a = df['value1'].filter(regex='(?i)value').sum(1).to_numpy()

# compute the pairwise difference, create a DataFrame and join
df2 = df.join(pd.DataFrame(abs(a-a[:,None]), columns=df['Name'], index=df.index))

output:

      Name  value1  Value2 finallist  cosmos  network  unab
0   cosmos      10      20  [10, 20]       0       40    30
1  network      30      40  [30, 40]      40        0    10
2     unab      20      40  [20, 40]      30       10     0

CodePudding user response:

I'm not quite sure how you wish to calculate the values in the columns so I just took the difference of value1, but it's irrelevant for the problem in this case. I'd do it the following way:

for name in df['name']:
  vals = []
  for idx, row in df.iterrows():
     vals.append(int(df.loc[df['name'] == name].value1) - row.value1)
  df[name] = vals

So for each name, for which you want to make a column, you go through and calculate the difference, add it to a list and then use that list as the column values for the new named column.

It's a quick fix, possible that there's a better way, but I hope this helps!

  • Related