I have a dataframe read in from some CSV data. I have two numbers in seperate columns, representing a range and would like to do the following;
A) iterate through the data set B) At each new index, calculate the range C) create a total number of new rows based on the sum of that range (doesn't need to be in the existing dataframe) D) And for each new row, insert incremental values through that range, concatanating specified column data, and retaining the original additional details from the other columns.
Example:
value | range1 | range2 | color |
---|---|---|---|
a | 0 | 3 | blue |
b | 4 | 6 | yellow |
Result:
output | value | colour |
---|---|---|
a1 | a | blue |
a2 | a | blue |
a3 | a | blue |
b4 | b | yellow |
b5 | b | yellow |
b6 | b | yellow |
CodePudding user response:
Here's one way to do it:
import pandas as pd
df = pd.DataFrame({
'value' : ['a', 'b'],
'range1' : [0, 4],
'range2' : [3, 6],
'color' : ['blue', 'yellow']
})
df['output'] = df.apply(lambda x: [x['value'] str(i 1) for i in range(x['range1'], x['range2'])], axis=1)
df = df.explode('output', ignore_index=True)[['output', 'value', 'color']]
print(df)
Output
output value color
0 a1 a blue
1 a2 a blue
2 a3 a blue
3 b5 b yellow
4 b6 b yellow
CodePudding user response:
Apart from that it's not clear why range 0 is not counted,
but here is one way:
df = df.loc[df.index.repeat(df.range2 - df.range1 1)]
df['output'] = df.value (df.range1 df.groupby('value').cumcount()).map(str)
output:
>>
value range1 range2 color output
0 a 0 3 blue a0
0 a 0 3 blue a1
0 a 0 3 blue a2
0 a 0 3 blue a3
1 b 4 6 yellow b4
1 b 4 6 yellow b5
1 b 4 6 yellow b6