Home > other >  How to expand DataFrame rows based on a column's value?
How to expand DataFrame rows based on a column's value?

Time:10-22

I have a DataFrame like this:

df = pd.DataFrame({'Column 1': ['a', 'a', 'b', 'c'],
                  'Column 2': [2, 2, 3, 4],
                  'Column 3': [100, 110, 120, 130]}
                  )

>
  Column 1  Column 2  Column 3
0        a         2       100
1        a         2       110
2        b         3       120
3        c         4       130

and I need a new DF like this:

df = pd.DataFrame({'Column 1': ['a', 'a', 'a', 'a', 'b', 'b', 'b', 'c', 'c', 'c', 'c'],
                  'New Column': ['a1', 'a2', 'a3', 'a4', 'b1', 'b2', 'b3', 'c1', 'c2', 'c3', 'c4'],
                  'Column 3': [100, 100, 110, 110, 120, 120, 120, 130, 130, 130, 130]}
                  )

   Column 1 New Column  Column 3
0         a         a1       100
1         a         a2       100
2         a         a3       110
3         a         a4       110
4         b         b1       120
5         b         b2       120
6         b         b3       120
7         c         c1       130
8         c         c2       130
9         c         c3       130
10        c         c4       130

I did it with 2 loops using itterrows and grouping by the "key" Column 1-Column 3, but it takes a long time to run and probably isn't the best solution, so I was wondering if there's a better way.

CodePudding user response:

Use index.repeat loc to scale up the DataFrame based on the numbers in Column 2 then reset_index to convert to unique range index. Then insert the New Column into the df using groupby cumcount:

# Scale up the DataFrame
df = df.loc[df.index.repeat(df.pop('Column 2'))].reset_index(drop=True)
# Insert new column in the correct place
df.insert(
    1, 'New Column',
    # Create New Column based on new Column 1 Values
    df['Column 1']   df.groupby('Column 1').cumcount().add(1).astype(str)
)

df:

   Column 1 New Column  Column 3
0         a         a1       100
1         a         a2       100
2         a         a3       110
3         a         a4       110
4         b         b1       120
5         b         b2       120
6         b         b3       120
7         c         c1       130
8         c         c2       130
9         c         c3       130
10        c         c4       130
  • Related