Home > Blockchain >  Dataframe to multiIndex for sktime format
Dataframe to multiIndex for sktime format

Time:02-03

I have a multivariate time series data which is in this format(pd.Dataframe with index on Time),

Existing format

I am trying to use Desired format

I was thinking if it is possible to transform it to new format.

CodePudding user response:

Edit here's a more straightforward and probably faster solution using row indexing

df = pd.DataFrame({
    'time':range(5),
    'a':[f'a{i}' for i in range(5)],
    'b':[f'b{i}' for i in range(5)],
})

w = 3
w_starts = range(0,len(df)-(w-1)) #start positions of each window

#iterate through the overlapping windows to create 'instance' col and concat
roll_df = pd.concat(
    df[s:s w].assign(instance=i) for (i,s) in enumerate(w_starts)
).set_index(['instance','time'])

print(roll_df)

Output

                a   b
instance time        
0        0     a0  b0
         1     a1  b1
         2     a2  b2
1        1     a1  b1
         2     a2  b2
         3     a3  b3
2        2     a2  b2
         3     a3  b3
         4     a4  b4

CodePudding user response:

Here's one way to achieve the desired result:

# Create the instance column
instance = np.repeat(range(len(df) - 2), 3)

# Repeat the Time column for each value in A and B
time = np.concatenate([df.Time[i:i 3].values for i in range(len(df) - 2)])

# Repeat the A column for each value in the rolling window
a = np.concatenate([df.A[i:i 3].values for i in range(len(df) - 2)])

# Repeat the B column for each value in the rolling window
b = np.concatenate([df.B[i:i 3].values for i in range(len(df) - 2)])

# Create a new DataFrame with the desired format
new_df = pd.DataFrame({'Instance': instance, 'Time': time, 'A': a, 'B': b})

# Set the MultiIndex on the new DataFrame
new_df.set_index(['Instance', 'Time'], inplace=True)
new_df

The output dataframe

  • Related