I would like to extract all values that meet a particular condition from a preexisting DataFrame and form them into a list. Is there a way that I can insert all of these values into each cell that meets another condition?
I began with a DataFrame similar to as follows:
Table_1
| ID | Team Assigned | Date Team Start | Date Team End | Service Date |
__________________________________________________________________________
| 23 | Red | 2022-09-01 | 2022-09-29 | 2022-08-15
| 23 | Blue | 2022-08-01 | 2022-09-15 | 2022-09-12
| 23 | Green | 2022-09-27 | 2022-09-30 | 2022-09-29
| 23 | Black | 2022-08-01 | 2022-08-17 | 2022-05-04
| 23 | Purple | 2022-08-15 | 2022-09-30 | 2022-08-17
| 07 | Blue | 2022-07-03 | 2022-09-30 | 2022-07-05
| 07 | Red | 2022-07-03 | 2022-07-05 | 2022-01-01
| 07 | Purple | 2022-05-01 | 2022-06-24 | 2022-05-15
I want to pull teams where the ID's service Date is within the team start / end date. From there I want to paste a comma separated list into the column where the ID and service date exist.
I was able to create an object that at least gathers the information using this:
for id in df['ID']:
a = df[(df['Service Date'] <= df['Date Team End']) &
(df['Service Date'] >= df['Date Team Start'])
a = a.groupby(['case no', 'Service Date'])['Team Assigned'].apply(lambda
x: list(np.unique(x)))
This gives me an object, but I'm not sure how I can paste those values into each cell with the respective ID and Service Date
I am aiming for something similar to the following:
Table_1
| ID | Team Assigned | Date Team Start | Date Team End | Service Date | Teams Involved
_________________________________________________________________________|
| 23 | Red | 2022-09-01 | 2022-09-29 | 2022-08-15 | Blue, Black, Purple
| 23 | Blue | 2022-08-01 | 2022-09-15 | 2022-09-12 | Red, Blue, Purple
| 23 | Green | 2022-09-27 | 2022-09-30 | 2022-09-29 | Red, Green, Purple
| 23 | Black | 2022-08-01 | 2022-08-17 | 2022-08-01 | Blue, Black
| 23 | Purple | 2022-08-15 | 2022-09-30 | 2022-08-17 | Blue, Black, Purple
| 07 | Blue | 2022-07-03 | 2022-09-30 | 2022-07-05 | Blue, Red
| 07 | Red | 2022-07-03 | 2022-07-05 | 2022-08-01 | Blue
| 07 | Purple | 2022-05-01 | 2022-06-24 | 2022-05-15 | Purple
EDIT:
So I think I found an (albeit unorthodox) solution that appears to work for me thus for. I used the code chunk above with some slight edits:
for id in df['ID']:
a = df[(df['Service Date'] <= df['Date Team End']) &
(df['Service Date'] >= df['Date Team Start'])
a = pd.DataFrame(a.groupby(['case no', 'Service Date'])['Team
Assigned'].apply(lambda x:
list(np.unique(x)))).reset_index().rename(columns = {'Team
Assigned':'List of Teams'}
Then merged this dataframe with my old one:
df2 = pd.merge(df, a, on = ['ID', 'Service Date'], how = 'left)
It sure does feel messy, and I need to verify that everything is working correctly, but this at first glance seems to work.
CodePudding user response:
Considering that OP's dataframe is as follows
df = pd.DataFrame({'ID': [23, 23, 23, 23, 23, 7, 7, 7],
'Team Assigned': ['Red', 'Blue', 'Green', 'Black', 'Purple', 'Blue', 'Red', 'Purple'],
'Date Team Start': ['2022-09-01', '2022-08-01', '2022-09-27', '2022-08-01', '2022-08-15', '2022-07-03', '2022-07-03', '2022-05-01'],
'Date Team End': ['2022-09-29', '2022-09-15', '2022-09-30', '2022-08-17', '2022-09-30', '2022-09-30', '2022-07-05', '2022-06-24'],
'Service Date': ['2022-08-15', '2022-09-12', '2022-09-29', '2022-05-04', '2022-08-17', '2022-07-05', '2022-01-01', '2022-05-15']})
# Convert date columns to datetime
df['Date Team Start'] = pd.to_datetime(df['Date Team Start'])
df['Date Team End'] = pd.to_datetime(df['Date Team End'])
df['Service Date'] = pd.to_datetime(df['Service Date'])
[Out]:
ID Team Assigned Date Team Start Date Team End Service Date
0 23 Red 2022-09-01 2022-09-29 2022-08-15
1 23 Blue 2022-08-01 2022-09-15 2022-09-12
2 23 Green 2022-09-27 2022-09-30 2022-09-29
3 23 Black 2022-08-01 2022-08-17 2022-05-04
4 23 Purple 2022-08-15 2022-09-30 2022-08-17
5 7 Blue 2022-07-03 2022-09-30 2022-07-05
6 7 Red 2022-07-03 2022-07-05 2022-01-01
7 7 Purple 2022-05-01 2022-06-24 2022-05-15
Considering that OP wants to
pull teams where the ID's service Date is within the team start / end date. From there I want to paste a comma separated list into the column where the ID and service date exist.
If I understood correctly, one can write a custom lambda function, such as
df['Teams Involved'] = df.apply(lambda x: ', '.join(df[(df['ID'] == x['ID']) & (df['Service Date'] >= x['Date Team Start']) & (df['Service Date'] <= x['Date Team End'])]['Team Assigned'].unique()), axis=1)
[Out]:
ID Team Assigned ... Service Date Teams Involved
0 23 Red ... 2022-08-15 Blue, Green
1 23 Blue ... 2022-09-12 Red, Blue, Purple
2 23 Green ... 2022-09-29 Green
3 23 Black ... 2022-05-04 Red, Purple
4 23 Purple ... 2022-08-17 Red, Blue, Green, Purple
5 7 Blue ... 2022-07-05 Blue
6 7 Red ... 2022-01-01 Blue
7 7 Purple ... 2022-05-15 Purple
Notes:
df['Teams Involved']
is a way to create a new column', '.join
means that one will be joining the list of teams with a comma and space between each team.df[(df['ID'] == x['ID'])
means that theID
is the same as theID
in the row.(df['Service Date'] >= x['Date Team Start'])
indicates that theService Date
is greater than or equal to theDate Team Start
.(df['Service Date'] <= x['Date Team End'])
means that theService Date
is less than or equal to theDate Team End
.df[(df['ID'] == x['ID']) & (df['Service Date'] >= x['Date Team Start']) & (df['Service Date'] <= x['Date Team End'])]
indicates that the dataframedf
is filtered to only include the conditions that we just mentioned in the previous points.df[...]['Team Assigned'].unique()
is a way to return the uniqueTeam Assigned
that satisfy the conditions indicated before.axis=1
is to indicate that the function will be applied to each row.Even though the output in is not equal to the one in the question, in both it is relatively similar to it. And, as OP is "aiming for something similar", I believe that this satisfies OP's requirement.