I'm trying to create a dataframe populated by repeating rows based on an existing steady sequence.
For example, if I had a sequence increasing in 3s from 6 to 18, the sequence could be generated using np.arange(6, 18, 3)
to give array([ 6, 9, 12, 15])
.
How would I go about generating a dataframe in this way?
How could I get the below if I wanted 6 repeated rows?
0 1 2 3
0 6.0 9.0 12.0 15.0
1 6.0 9.0 12.0 15.0
2 6.0 9.0 12.0 15.0
3 6.0 9.0 12.0 15.0
4 6.0 9.0 12.0 15.0
5 6.0 9.0 12.0 15.0
6 6.0 9.0 12.0 15.0
The reason for creating this matrix is that I then wish to add a pd.sequence row-wise to this matrix
CodePudding user response:
Here is a solution using NumPy broadcasting which avoids Python loops, lists, and excessive memory allocation (as done by np.repeat):
pd.DataFrame(np.broadcast_to(np.arange(6, 18, 3), (6, 4)))
To understand why this is more efficient than other solutions, refer to the np.broadcast_to()
docs: https://numpy.org/doc/stable/reference/generated/numpy.broadcast_to.html
more than one element of a broadcasted array may refer to a single memory location.
This means that no matter how many rows you create before passing to Pandas, you're only really allocating a single row, then a 2D array which refers to the data of that row multiple times.
CodePudding user response:
pd.DataFrame([np.arange(6, 18, 3)]*7)
alternately,
pd.DataFrame(np.repeat([np.arange(6, 18, 3)],7, axis=0))
0 1 2 3
0 6 9 12 15
1 6 9 12 15
2 6 9 12 15
3 6 9 12 15
4 6 9 12 15
5 6 9 12 15
6 6 9 12 15