Home > front end >  Pandas group by column and add numbered rows to each group
Pandas group by column and add numbered rows to each group

Time:09-28

I have the following data stored in a pandas.DataFrame object named df. The column id is a unique identifier and the remaining columns are irrelevant for the purpose of this question.

id x1 x2 x3
0 01001 523.41 639673 1222.13
1 01002 54.832 33746 615.443
2 01003 48.3824 45196 934.142

I want to know if there's a way to group by id and use assign to add multiple numbered rows to each group.

In other words, I want to use range to add an arbitrary number of rows to each id. The desired result looks as follows:

id x1 x2 x3 new_col
0 01001 523.41 639673 1222.13 2020
0 01001 523.41 639673 1222.13 2021
1 01002 54.832 33746 615.443 2020
1 01002 54.832 33746 615.443 2021
2 01003 48.3824 45196 934.142 2020
2 01003 48.3824 45196 934.142 2021

I was hoping something along these lines would work.

df = df.groupby('id').assign(new_col=range(2020, 2022))

CodePudding user response:

You can repeat the index and use numpy.tile to tile the new data:

import numpy as np

new = ['2020', '2021']

df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = np.tile(new, len(df))

output:

     id        x1      x2        x3 new_col
0  1001  523.4100  639673  1222.130    2020
0  1001  523.4100  639673  1222.130    2021
1  1002   54.8320   33746   615.443    2020
1  1002   54.8320   33746   615.443    2021
2  1003   48.3824   45196   934.142    2020
2  1003   48.3824   45196   934.142    2021

Alternatively:

new = range(2020, 2022)
df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = new.start   df2.groupby('id').cumcount()

CodePudding user response:

We can also create a MultiIndex.from_product and reindex relative to level=0, to scale up the DataFrame, and reset_index to convert the inner level of the MultiIndex into a column:

df = df.reindex(
    index=pd.MultiIndex.from_product(
        [df.index, range(2020, 2022)],
        names=[None, 'new_col']
    ),
    level=0
).reset_index(level='new_col')

df:

   new_col    id        x1      x2        x3
0     2020  1001  523.4100  639673  1222.130
0     2021  1001  523.4100  639673  1222.130
1     2020  1002   54.8320   33746   615.443
1     2021  1002   54.8320   33746   615.443
2     2020  1003   48.3824   45196   934.142
2     2021  1003   48.3824   45196   934.142

We can reorder new_col to the end if needed:

df = df.reindex(
    index=pd.MultiIndex.from_product(
        [df.index, range(2020, 2022)],
        names=[None, 'new_col']
    ),
    level=0
).reset_index(level='new_col')

# Re-order columns
df = df.reindex(
    columns=df.columns[df.columns != 'new_col'].union(['new_col'], sort=False)
)

df:

     id        x1      x2        x3  new_col
0  1001  523.4100  639673  1222.130     2020
0  1001  523.4100  639673  1222.130     2021
1  1002   54.8320   33746   615.443     2020
1  1002   54.8320   33746   615.443     2021
2  1003   48.3824   45196   934.142     2020
2  1003   48.3824   45196   934.142     2021

Setup:

import pandas as pd

df = pd.DataFrame({
    'id': [1001, 1002, 1003],
    'x1': [523.41, 54.832, 48.3824],
    'x2': [639673, 33746, 45196],
    'x3': [1222.13, 615.443, 934.142]
})
  • Related