I have a DataFrame like this:
df = pd.DataFrame({'Column 1': ['a', 'a', 'b', 'c'],
'Column 2': [2, 2, 3, 4],
'Column 3': [100, 110, 120, 130]}
)
>
Column 1 Column 2 Column 3
0 a 2 100
1 a 2 110
2 b 3 120
3 c 4 130
and I need a new DF like this:
df = pd.DataFrame({'Column 1': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c'],
'New Column': ['a1', 'a2', 'a3', 'a4', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3', 'c4'],
'Column 3': [100, 100, 110, 110, 120, 120, 120, 130, 130, 130, 130]}
)
Column 1 New Column Column 3
0 a a1 100
1 a a2 100
2 a a3 110
3 a a4 110
4 b b1 120
5 b b2 120
6 b b3 120
7 c c1 130
8 c c2 130
9 c c3 130
10 c c4 130
I did it with 2 loops using itterrows and grouping by the "key" Column 1-Column 3, but it takes a long time to run and probably isn't the best solution, so I was wondering if there's a better way.
CodePudding user response:
Use index.repeat
loc
to scale up the DataFrame based on the numbers in Column 2
then reset_index
to convert to unique range index. Then insert
the New Column
into the df
using groupby cumcount
:
# Scale up the DataFrame
df = df.loc[df.index.repeat(df.pop('Column 2'))].reset_index(drop=True)
# Insert new column in the correct place
df.insert(
1, 'New Column',
# Create New Column based on new Column 1 Values
df['Column 1'] df.groupby('Column 1').cumcount().add(1).astype(str)
)
df
:
Column 1 New Column Column 3
0 a a1 100
1 a a2 100
2 a a3 110
3 a a4 110
4 b b1 120
5 b b2 120
6 b b3 120
7 c c1 130
8 c c2 130
9 c c3 130
10 c c4 130