Home > Blockchain >  looping through columns that generates multiple columns automatically
looping through columns that generates multiple columns automatically

Time:06-27

Table:

  • Need to ignore timestamp.
  • n2-n1, n3- n2 ..... rows are not present in my case.
  • Need a fast solution, since the dataset is somewhat large.
time n1 n2 n3 n5 n6 n7 n8 n9 n10 n11 n2 - n1 n3 - n2 n4 - n3 ... n11-n10
0 11:50 1 2 3 4 4 4 4 4 4 4 ... ... ... ... ...
1 12:50 5 6 7 8 8 8 8 8 8 8 ... ... ... ... ...
2 13:50 8 7 6 5 5 5 5 5 5 5 ... ... ... ... ...

Use this code to fiddle:

import pandas as pd
import numpy as np

time = ['11:50', '12:50', '13:50']
data_1 = {'time': time,
          'n1': [1, 5, 8],
          'n2': [2, 6 ,7],
          'n3': [3, 7 ,6],
          'n5': [4, 8, 5],
          'n6': [4, 8, 5],
          'n7': [4, 8, 5],
          'n8': [4, 8, 5],
          'n9': [4, 8, 5],
          'n10': [4, 8, 5],
          'n11': [4, 8, 5],
        }

df1 = pd.DataFrame(data = data_1)
df1

I need to generate n2-n1, n3-n2...n11-n10 columns. With all differences respectively.

CodePudding user response:

You can use a for loop with zip():

for fst, snd in zip(df1.columns[1:], df1.columns[2:]):
    df1[f"{snd}-{fst}"] = df1[snd] - df1[fst]

print(df1)

This outputs:

    time  n1  n2  n3  n5  n6  n7  n8  n9  n10  n11  n2-n1  n3-n2  n5-n3  n6-n5  n7-n6  n8-n7  n9-n8  n10-n9  n11-n10
0  11:50   1   2   3   4   4   4   4   4    4    4      1      1      1      0      0      0      0       0        0
1  12:50   5   6   7   8   8   8   8   8    8    8      1      1      1      0      0      0      0       0        0
2  13:50   8   7   6   5   5   5   5   5    5    5     -1     -1     -1      0      0      0      0       0        0

CodePudding user response:

You can use DataFrame.diff method over the columns (axis=1) to compute consecutive column differences

df1 = df1.join(
    df1.filter(like='n')  # get only the columns that start with 'n'
       .diff(axis=1) # consecutive column differences 
       .dropna(axis=1)  # drop the first column: n1 - nothing = NaN
       .rename(columns=lambda col: f'{col}-n{int(col[1:]) - 1}') # relabel the columns
)  

Output:

time n1 n2 n3 n5 n6 n7 n8 n9 n10 n11 n2-n1 n3-n2 n5-n4 n6-n5 n7-n6 n8-n7 n9-n8 n10-n9 n11-n10
0 11:50 1 2 3 4 4 4 4 4 4 4 1 1 1 0 0 0 0 0 0
1 12:50 5 6 7 8 8 8 8 8 8 8 1 1 1 0 0 0 0 0 0
2 13:50 8 7 6 5 5 5 5 5 5 5 -1 -1 -1 0 0 0 0 0 0
  • Related