I'm creating a dataset for the FiFa worldcup 2022 and I'm trying to create a column that counts the number of matches a given country is unbeaten (i.e. has only won or drawn in previous matches) before the start of a given match.
I've been trying to get the logic going using a counter that loops over a list of results, but I can't get it to work on a team level.
My desired output would look something like this:
Date | Team | Result | Desired output |
---|---|---|---|
2021-06-27 | NL | lose | |
2021-09-01 | NL | win | 0 |
2021-09-04 | NL | win | 1 |
2021-09-08 | NL | win | 2 |
2021-09-15 | NL | draw | 3 |
2021-09-18 | NL | Lose | 4 |
2021-10-01 | NL | Win | 0 |
2021-10-05 | NL | Win | 1 |
2021-06-27 | UK | lose | |
2021-09-01 | NL | win | 0 |
2021-09-04 | UK | draw | 1 |
2021-08-27 | UK | lose | 2 |
2021-08-31 | UK | win | 0 |
I would greatly appreciate any help.
CodePudding user response:
I believe the provided output is incorrect as NL is still unbeaten on 2021-09-01
.
I would thus use:
group = df['Result'][::-1].str.lower().eq('lose').groupby(df['Team']).cumsum()
df['days_unbeaten'] = df.groupby([df['Team'], group]).cumcount()
output:
Date Team Result Desired output days_unbeaten
0 2021-06-27 NL lose NaN 0
1 2021-09-01 NL win 0.0 0
2 2021-09-04 NL win 1.0 1
3 2021-09-08 NL win 2.0 2
4 2021-09-15 NL draw 3.0 3
5 2021-09-18 NL Lose 4.0 4
6 2021-10-01 NL Win 0.0 0
7 2021-10-05 NL Win 1.0 1
8 2021-06-27 UK lose NaN 0
9 2021-09-01 NL win 0.0 2
10 2021-09-04 UK draw 1.0 0
11 2021-08-27 UK lose 2.0 1
12 2021-08-31 UK win 0.0 0