I have a dataframe e.g.:
my_label value
0 A 1
1 A 85
2 B 65
3 B 41
4 B 21
5 C 3
I want to group by my_label and to pad groups to a certain length modulo and filling by last value. For example if I want to have multiple of 4, it would give :
my_label value
0 A 1
1 A 85
2 A 85
3 A 85
4 B 65
5 B 41
6 B 21
7 B 21
8 C 3
9 C 3
10 C 3
11 C 3
I managed to get a solution that should be working, but for some reason the reindex isn't done at the end of the groups.
def _pad(group, seq_len):
pad_number = seq_len - (len(group) % seq_len)
if pad_number != seq_len:
group = group.reindex(range(len(group) pad_number)).ffill()
return group
df = (df.groupby('my_label')
.apply(_pad, (4))
.reset_index(drop = True))
Here is the code to the above DF for testing :
import pandas as pd
df = pd.DataFrame({"my_label":["A","A","B","B","B","C"], "value":[1,85,65,41,21,3]})
CodePudding user response:
You can concatenate per group a dummy DataFrame with the number of missing rows, then ffill
:
N = 4
out = (df
.groupby('my_label', group_keys=False)
.apply(lambda d: pd.concat([d, pd.DataFrame(columns=d.columns,
index=range(N-len(d)))]))
.ffill()
.reset_index(drop=True)
)
or, directly concatenating the last row as many times as needed:
(df
.groupby('my_label', group_keys=False)
.apply(lambda d: pd.concat([d, d.loc[[d.index[-1]]*(N-len(d))]]))
.reset_index(drop=True)
)
output:
my_label value
0 A 1
1 A 85
2 A 85
3 A 85
4 B 65
5 B 41
6 B 21
7 B 21
8 C 3
9 C 3
10 C 3
11 C 3
CodePudding user response:
You can simply solve this by creating an index that represents your desired output, aligning that to your existing data, and then forward filling.
index = pd.MultiIndex.from_product([df['my_label'].unique(), range(4)], names=['my_label', None])
out = (
df.set_index(
['my_label', df.groupby('my_label').cumcount()]
)
.reindex(index, method='ffill')
)
print(out)
value
my_label
A 0 1.0
1 85.0
2 85.0
3 85.0
B 0 65.0
1 41.0
2 21.0
3 21.0
C 0 3.0
1 3.0
2 3.0
3 3.0