Home > Enterprise >  repeat rows to get data for all weeks
repeat rows to get data for all weeks

Time:02-16

I wonder, if there is a less clumsy way to create all combinations for all weeks for the following example:

df = pd.DataFrame({
    'Gender': ['female', 'female', 'female', 'female'],
    'Bla': ['a', 'a', 'a', 'a'],
    'Week': [1, 2, 3, 4]
})
print(df)

number_of_weeks = 52
temp = df[['Gender', 'Bla']].drop_duplicates()
results = temp.loc[temp.index.repeat(number_of_weeks)]
results["Week"] = range(1, len(results)   1, 1)
print(results)

This kind of work but it also should work for situations with more than 1 "level" for a categorical variable.

df = pd.DataFrame({
    'Gender': ['female', 'female', 'female', 'female'],
    'Bla': ['a', 'b', 'a', 'a'],
    'Week': [1, 2, 3, 4]
})

The above code would expand this to infeasible week numbers.

CodePudding user response:

A first approach can be set Week as index and group by other columns:

fill_week = lambda x: x.reindex(range(1, 53)).ffill().reset_index()

out = df.set_index('Week').groupby(['Gender', 'Bla'], as_index=False) \
        .apply(fill_week).reset_index(drop=True)
print(out)

# Output
     Week  Gender Bla
0       1  female   a
1       2  female   a
2       3  female   a
3       4  female   a
4       5  female   a
..    ...     ...  ..
99     48  female   b
100    49  female   b
101    50  female   b
102    51  female   b
103    52  female   b

[104 rows x 3 columns]

CodePudding user response:

Try with concat:

temp = df[["Gender", "Bla"]].drop_duplicates()
output = pd.concat([temp]*52, ignore_index=True).assign(week=list(range(1,53))*temp.shape[0])

>>> output
     Gender Bla  week
0    female   a     1
1    female   b     2
2    female   a     3
3    female   b     4
4    female   a     5
..      ...  ..   ...
99   female   b    48
100  female   a    49
101  female   b    50
102  female   a    51
103  female   b    52

[104 rows x 3 columns]

CodePudding user response:

Let us fix your code:

s = df[['Gender', 'Bla']].drop_duplicates()
out = s.reindex(s.index.repeat(52))
out['Week'] = np.tile(range(1, 53), len(s))

Gender Bla  Week
female   a     1
female   a     2
female   a     3
female   a     4
female   a     5
female   a     6
female   a     7
...
female   a    51
female   a    52
female   b     1
...
female   b    51
female   b    52

CodePudding user response:

This is the same as to 'cross merge' using pd.merge:

gender = pd.DataFrame(['female'], columns={'gender'})
bla = pd.DataFrame(['a', 'b', 'a'], columns={'bla'})
week = pd.DataFrame(range(1,5), columns={'week'})

pd.merge(gender, bla, how='cross').merge(week, how='cross')

Output:

    gender bla  week
0   female   a     1
1   female   a     2
2   female   a     3
3   female   a     4
4   female   b     1
5   female   b     2
6   female   b     3
7   female   b     4
8   female   a     1
9   female   a     2
10  female   a     3
11  female   a     4

What's not perfectly clear from your question is whether ['gender', 'bla'] is actually one set of tuples (and thus you would only do a single cross-merge of that tuple with week), or whether gender and bla are also to be permuted against each other (example above does that, cross all female with all a). Cross-merge can do both, of course.

  • Related