Home > Net >  How to create new rows based on range of two series?
How to create new rows based on range of two series?

Time:09-07

I have a dataframe like this:

df = pd.DataFrame({"ID":[1, 1],
                   "CLASS":["A","B"],
                   "TOP":[12.4, 29.1],
                   "BOT":[29.1, 32.8]})

I would like to create a new dataframe, but with a specific range of samples (for example, 5) between my TOP and BOT values.

I know I can create a range with 5 samples with np.linspace, but how can I put this range like two new series alternating TOP and BOT?

range_A = np.linspace(df["TOP"][df["CLASS"] == "A"].min(), df["BOT"][df["CLASS"] == "A"].min(), 5)
range_A
Out[36]: array([12.4  , 16.575, 20.75 , 24.925, 29.1  ])
    
range_B = np.linspace(df["TOP"][df["CLASS"] == "B"].min(), df["BOT"][df["CLASS"] == "B"].min(), 5)
range_B
Out[37]: array([29.1  , 30.025, 30.95 , 31.875, 32.8  ])

The new dataframe doesn't need to have 5 samples for each CLASS. So it should be like this:

enter image description here

Anyone could help me?

CodePudding user response:

IIUC, you can use a function:

def topbot(d):
    a = np.linspace(d['TOP'].min(), d['BOT'].max(), num=5)
    return pd.DataFrame({'TOP': a[:-1], 'BOT': a[1:]})

out = (df
   .groupby(['ID', 'CLASS'])
   .apply(topbot)
   .droplevel(-1)
   .reset_index()
)

output:

   ID CLASS     TOP     BOT
0   1     A  12.400  16.575
1   1     A  16.575  20.750
2   1     A  20.750  24.925
3   1     A  24.925  29.100
4   1     B  29.100  30.025
5   1     B  30.025  30.950
6   1     B  30.950  31.875
7   1     B  31.875  32.800

CodePudding user response:

You could do by groupby apply, then shifting the results within the groups.

df = pd.DataFrame({"ID":[1, 1],
                   "CLASS":["A","B"],
                   "TOP":[12.4, 29.1],
                   "BOT":[29.1, 32.8]})

df = (
    df.groupby(['ID','CLASS'])
      .apply(lambda x: np.linspace(x.values.min(), x.values.max(), 5))
      .explode()
      .reset_index(name='TOP')
      .assign(BOT=lambda x: x.groupby(['ID','CLASS'])['TOP'].shift(-1))
      .dropna()
)

Output

   ID CLASS     TOP     BOT
0   1     A    12.4  16.575
1   1     A  16.575   20.75
2   1     A   20.75  24.925
3   1     A  24.925    29.1
5   1     B    29.1  30.025
6   1     B  30.025   30.95
7   1     B   30.95  31.875
8   1     B  31.875    32.8

CodePudding user response:

def func(x,n):
    s = pd.Series(np.linspace(x.TOP.min(), x.BOT.min(), n))
    y = pd.DataFrame(np.repeat(x.values, n - 1, axis=0), columns=x.columns)
    y.TOP = s
    y.BOT = s.shift(-1)
    return y


df = pd.DataFrame({"ID": [1, 1],
                   "CLASS": ["A", "B"],
                   "TOP": [12.4, 29.1],
                   "BOT": [29.1, 32.8]})

df = df.groupby('CLASS', group_keys=False).apply(func,n=5)
print(df)

Prints:

  ID CLASS     TOP     BOT
0  1     A  12.400  16.575
1  1     A  16.575  20.750
2  1     A  20.750  24.925
3  1     A  24.925  29.100
0  1     B  29.100  30.025
1  1     B  30.025  30.950
2  1     B  30.950  31.875
3  1     B  31.875  32.800
  • Related