Say I have the following dataframe:
value length
0 2.4 3
1 5.3 6
2 17.1 8
3 11.9 11
df = pd.DataFrame([(2.4, 3),(5.3, 6),(17.1, 8),(11.9, 11)], columns=['value', 'length'])
Column length
is of integer and strictly increasing. I would like to insert rows where there there is a gap of more than 1
in column length
, and the values of the new rows take the value immediately before it. Here is the expected output:
value length
0 2.4 3
1 2.4 4
2 2.4 5
3 5.3 6
4 5.3 7
5 17.1 8
6 17.1 9
7 17.1 10
8 11.9 11
What would be a good way to do this? Thanks.
CodePudding user response:
You can use df['length'].diff
to compute the number of repeats for each index, and df.index.repeat
to repeat the items of the index per those values. Then you can index the dataframe using the resulting repeated index, and use df.assign
to add a column containing a range from the smallest value of the length
column to the largest value (both inclusive).
df = (
df.loc[
df.index.repeat(
df['length']
.diff()
.shift(-1, fill_value=1)
)
]
.reset_index(drop=True)
.assign(length=np.arange(
df['length'].min(),
df['length'].max() 1
))
)
Output:
>>> df
value length
0 2.4 3
1 2.4 4
2 2.4 5
3 5.3 6
4 5.3 7
5 17.1 8
6 17.1 9
7 17.1 10
8 11.9 11
Compact version:
df = df.loc[df.index.repeat(df['length'].diff().shift(-1, fill_value=1))].reset_index(drop=True).assign(length=np.arange(df['length'].min(), df['length'].max() 1))
CodePudding user response:
Let us do set_index
reindex
out = df.set_index('length').\
reindex(range(df.length.min(),df.length.max() 1),method = 'ffill').reset_index()
length value
0 3 2.4
1 4 2.4
2 5 2.4
3 6 5.3
4 7 5.3
5 8 17.1
6 9 17.1
7 10 17.1
8 11 11.9
CodePudding user response:
Another option using reindex
:
df.set_index('length')
.reindex(np.arange(df.length.min(), df.length.max() 1))
.ffill()
.reset_index()
length value
0 3 2.4
1 4 2.4
2 5 2.4
3 6 5.3
4 7 5.3
5 8 17.1
6 9 17.1
7 10 17.1
8 11 11.9
Running example: https://akuiper.com/console/fsm1iI6PIhxA