I have a dataframe with three columns. I want to build an array from this dataframe. I want to use a window
to move across the rows. I want to start from row 0 until end, and put the value of three columns in one row of array based on the column b
in data frame.
For example, with window size of 3, I put the 1,2,3
which are the three value in the first window and in column a
put with the three value in column c
, -1,1,5
and at the last column the value of column b is 1
. This process will contintue until the column b
changed. Then, for the next value of column b
we have the same process. Thank you in advance
Here is the dataframe which I have:
import pandas as pd
df = pd.DataFrame()
df ['a'] = [1,2,3,4,5,6,7,8,9,10, 1,2,3]
df ['b'] = [1,1,1,1,2,2,2,2,7,7,7,7,7]
df ['c'] = [-1,1,5,6,7,3,1,6,8,3,9,10,1]
df
a b c
0 1 1 -1
1 2 1 1
2 3 1 5
3 4 1 6
4 5 2 7
5 6 2 3
6 7 2 1
7 8 2 6
8 9 7 8
9 10 7 3
10 1 7 9
11 2 7 10
12 3 7 1
The array which I want is:
array([[ 1, 2, 3, -1, 1, 5, 1],
[ 2, 3, 4, 1, 5, 6, 1],
[ 5, 6, 7, 7, 3, 1, 2],
[ 6, 7, 8, 3, 1, 6, 2],
[ 9, 10, 1, 8, 3, 9, 7],
[10, 1, 2, 3, 9, 10, 7],
[ 1, 2, 3, 9, 10, 1, 7]])
I tried, but I think that I am far away from a solution.
w = 3
A = np.zeros((w,len(df)-w 1))
for i in range(len(df)-w 1):
A[:,i] = df.iloc[i:w i,0]
A = A.T
CodePudding user response:
IIUC, you could use a GroupBy.apply
and a bit of numpy:
# group by consecutive b
group = df['b'].diff().abs().gt(0).cumsum()
# or if simply groupby by 'b' overall (not consecutive)
# group = 'b'
def flatten_group(g):
# pandas' rolling only aggregates, doing a custom rolling
# that flattens the columns a & c and add the 1st value of b
return [np.r_[g.iloc[i:i 3][['a', 'c']].to_numpy().ravel('F'),
[g['b'].iloc[0]]]
for i in range(len(g)-2)]
# apply per group and stack as numpy array
a = np.vstack(df.groupby(group).apply(flatten_group).explode())
output:
array([[ 1, 2, 3, -1, 1, 5, 1],
[ 2, 3, 4, 1, 5, 6, 1],
[ 5, 6, 7, 7, 3, 1, 2],
[ 6, 7, 8, 3, 1, 6, 2],
[ 9, 10, 1, 8, 3, 9, 7],
[10, 1, 2, 3, 9, 10, 7],
[ 1, 2, 3, 9, 10, 1, 7]])
CodePudding user response:
another way:
res = []
for n,g in df.groupby('b'):
for r in g[['a','c']].rolling(3):
if len(r)==3:
res.append(r.T.values.ravel().tolist() [n])
np.array(res)
'''
array([[ 1, 2, 3, -1, 1, 5, 1],
[ 2, 3, 4, 1, 5, 6, 1],
[ 5, 6, 7, 7, 3, 1, 2],
[ 6, 7, 8, 3, 1, 6, 2],
[ 9, 10, 1, 8, 3, 9, 7],
[10, 1, 2, 3, 9, 10, 7],
[ 1, 2, 3, 9, 10, 1, 7]])