My initial example dataframe is in the following format:
>>> import pandas as pd
>>> d = {'n': ['one', 'two', 'three', 'four'],
'initial': [3, 4, 10, 10],
'final': [3, 7, 11, 7],}
>>> df = pd.DataFrame(d)
>>> df
n initial final
0 one 3 3
1 two 4 7
2 three 10 11
3 four 10 7
What I hope to achieve is to duplicate the values in the n
column a number of times corresponding to the values between those in the initial
and final
columns.
For example, in the first row, initial
and final
hold the same value, so there should be one instance of 'one'
in the output dataframe's n
column. For the second row, initial
and final
are three numbers apart, so there should be four repetitions of 'two'
, and so on. If final
is less than initial
, there should be no instances of the value in n
in the output.
There should also be a count
column which counts up from the value in the initial
column to the value in the final
column. My expected output is as follows:
n count
0 one 3
1 two 4
2 two 5
3 two 6
4 two 7
5 three 10
6 three 11
I've tried using reindex
with a new index based on df.final - df.initial 1
, but this does not handle the negative values as in the fourth row of the example dataframe.
CodePudding user response:
In your case create the list
with range then explode
# in order to save running time , df = df.query('initial<=final')
df['count'] = [ range(x, y 1) for x, y in zip(df['initial'], df['final'])]
out = df.explode('count').dropna()
Out[13]:
n initial final count
0 one 3 3 3
1 two 4 7 4
1 two 4 7 5
1 two 4 7 6
1 two 4 7 7
2 three 10 11 10
2 three 10 11 11
CodePudding user response:
out = (df[['n']].join(df.apply(lambda x: range(x[1], x[2] 1), axis=1)
.explode().to_frame('count'))
.dropna(subset=['count']))
out
n count
0 one 3
1 two 4
2 two 5
3 two 6
4 two 7
5 three 10
6 three 11