I wonder, if there is a less clumsy way to create all combinations for all weeks for the following example:
df = pd.DataFrame({
'Gender': ['female', 'female', 'female', 'female'],
'Bla': ['a', 'a', 'a', 'a'],
'Week': [1, 2, 3, 4]
})
print(df)
number_of_weeks = 52
temp = df[['Gender', 'Bla']].drop_duplicates()
results = temp.loc[temp.index.repeat(number_of_weeks)]
results["Week"] = range(1, len(results) 1, 1)
print(results)
This kind of work but it also should work for situations with more than 1 "level" for a categorical variable.
df = pd.DataFrame({
'Gender': ['female', 'female', 'female', 'female'],
'Bla': ['a', 'b', 'a', 'a'],
'Week': [1, 2, 3, 4]
})
The above code would expand this to infeasible week numbers.
CodePudding user response:
A first approach can be set Week
as index and group by other columns:
fill_week = lambda x: x.reindex(range(1, 53)).ffill().reset_index()
out = df.set_index('Week').groupby(['Gender', 'Bla'], as_index=False) \
.apply(fill_week).reset_index(drop=True)
print(out)
# Output
Week Gender Bla
0 1 female a
1 2 female a
2 3 female a
3 4 female a
4 5 female a
.. ... ... ..
99 48 female b
100 49 female b
101 50 female b
102 51 female b
103 52 female b
[104 rows x 3 columns]
CodePudding user response:
Try with concat
:
temp = df[["Gender", "Bla"]].drop_duplicates()
output = pd.concat([temp]*52, ignore_index=True).assign(week=list(range(1,53))*temp.shape[0])
>>> output
Gender Bla week
0 female a 1
1 female b 2
2 female a 3
3 female b 4
4 female a 5
.. ... .. ...
99 female b 48
100 female a 49
101 female b 50
102 female a 51
103 female b 52
[104 rows x 3 columns]
CodePudding user response:
Let us fix your code:
s = df[['Gender', 'Bla']].drop_duplicates()
out = s.reindex(s.index.repeat(52))
out['Week'] = np.tile(range(1, 53), len(s))
Gender Bla Week
female a 1
female a 2
female a 3
female a 4
female a 5
female a 6
female a 7
...
female a 51
female a 52
female b 1
...
female b 51
female b 52
CodePudding user response:
This is the same as to 'cross merge' using pd.merge
:
gender = pd.DataFrame(['female'], columns={'gender'})
bla = pd.DataFrame(['a', 'b', 'a'], columns={'bla'})
week = pd.DataFrame(range(1,5), columns={'week'})
pd.merge(gender, bla, how='cross').merge(week, how='cross')
Output:
gender bla week
0 female a 1
1 female a 2
2 female a 3
3 female a 4
4 female b 1
5 female b 2
6 female b 3
7 female b 4
8 female a 1
9 female a 2
10 female a 3
11 female a 4
What's not perfectly clear from your question is whether ['gender', 'bla']
is actually one set of tuples (and thus you would only do a single cross-merge of that tuple with week), or whether gender
and bla
are also to be permuted against each other (example above does that, cross all female
with all a
). Cross-merge can do both, of course.