I have the following data stored in a pandas.DataFrame object named df
. The column id
is a unique identifier and the remaining columns are irrelevant for the purpose of this question.
id | x1 | x2 | x3 | |
---|---|---|---|---|
0 | 01001 | 523.41 | 639673 | 1222.13 |
1 | 01002 | 54.832 | 33746 | 615.443 |
2 | 01003 | 48.3824 | 45196 | 934.142 |
I want to know if there's a way to group by id
and use assign
to add multiple numbered rows to each group.
In other words, I want to use range
to add an arbitrary number of rows to each id
. The desired result looks as follows:
id | x1 | x2 | x3 | new_col | |
---|---|---|---|---|---|
0 | 01001 | 523.41 | 639673 | 1222.13 | 2020 |
0 | 01001 | 523.41 | 639673 | 1222.13 | 2021 |
1 | 01002 | 54.832 | 33746 | 615.443 | 2020 |
1 | 01002 | 54.832 | 33746 | 615.443 | 2021 |
2 | 01003 | 48.3824 | 45196 | 934.142 | 2020 |
2 | 01003 | 48.3824 | 45196 | 934.142 | 2021 |
I was hoping something along these lines would work.
df = df.groupby('id').assign(new_col=range(2020, 2022))
CodePudding user response:
You can repeat the index and use numpy.tile
to tile the new data:
import numpy as np
new = ['2020', '2021']
df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = np.tile(new, len(df))
output:
id x1 x2 x3 new_col
0 1001 523.4100 639673 1222.130 2020
0 1001 523.4100 639673 1222.130 2021
1 1002 54.8320 33746 615.443 2020
1 1002 54.8320 33746 615.443 2021
2 1003 48.3824 45196 934.142 2020
2 1003 48.3824 45196 934.142 2021
Alternatively:
new = range(2020, 2022)
df2 = df.loc[df.index.repeat(len(new))]
df2['new_col'] = new.start df2.groupby('id').cumcount()
CodePudding user response:
We can also create a MultiIndex.from_product
and reindex
relative to level=0
, to scale up the DataFrame, and reset_index
to convert the inner level of the MultiIndex into a column:
df = df.reindex(
index=pd.MultiIndex.from_product(
[df.index, range(2020, 2022)],
names=[None, 'new_col']
),
level=0
).reset_index(level='new_col')
df
:
new_col id x1 x2 x3
0 2020 1001 523.4100 639673 1222.130
0 2021 1001 523.4100 639673 1222.130
1 2020 1002 54.8320 33746 615.443
1 2021 1002 54.8320 33746 615.443
2 2020 1003 48.3824 45196 934.142
2 2021 1003 48.3824 45196 934.142
We can reorder new_col
to the end if needed:
df = df.reindex(
index=pd.MultiIndex.from_product(
[df.index, range(2020, 2022)],
names=[None, 'new_col']
),
level=0
).reset_index(level='new_col')
# Re-order columns
df = df.reindex(
columns=df.columns[df.columns != 'new_col'].union(['new_col'], sort=False)
)
df
:
id x1 x2 x3 new_col
0 1001 523.4100 639673 1222.130 2020
0 1001 523.4100 639673 1222.130 2021
1 1002 54.8320 33746 615.443 2020
1 1002 54.8320 33746 615.443 2021
2 1003 48.3824 45196 934.142 2020
2 1003 48.3824 45196 934.142 2021
Setup:
import pandas as pd
df = pd.DataFrame({
'id': [1001, 1002, 1003],
'x1': [523.41, 54.832, 48.3824],
'x2': [639673, 33746, 45196],
'x3': [1222.13, 615.443, 934.142]
})