How to insert list of values into a "cell" of a Pandas DataFrame only if condition is met-CodePudding

I would like to extract all values that meet a particular condition from a preexisting DataFrame and form them into a list. Is there a way that I can insert all of these values into each cell that meets another condition?

I began with a DataFrame similar to as follows:

Table_1

| ID    | Team Assigned | Date Team Start | Date Team End | Service Date |
__________________________________________________________________________
| 23    |  Red          |   2022-09-01    |  2022-09-29   | 2022-08-15
| 23    |  Blue         |   2022-08-01    |  2022-09-15   | 2022-09-12
| 23    |  Green        |   2022-09-27    |  2022-09-30   | 2022-09-29
| 23    |  Black        |   2022-08-01    |  2022-08-17   | 2022-05-04
| 23    |  Purple       |   2022-08-15    |  2022-09-30   | 2022-08-17
| 07    |  Blue         |   2022-07-03    |  2022-09-30   | 2022-07-05
| 07    |  Red          |   2022-07-03    |  2022-07-05   | 2022-01-01
| 07    |  Purple       |   2022-05-01    |  2022-06-24   | 2022-05-15

I want to pull teams where the ID's service Date is within the team start / end date. From there I want to paste a comma separated list into the column where the ID and service date exist.

I was able to create an object that at least gathers the information using this:

for id in df['ID']:
   a = df[(df['Service Date'] <= df['Date Team End']) &
          (df['Service Date'] >= df['Date Team Start'])
   a = a.groupby(['case no', 'Service Date'])['Team Assigned'].apply(lambda 
   x: list(np.unique(x)))

This gives me an object, but I'm not sure how I can paste those values into each cell with the respective ID and Service Date

I am aiming for something similar to the following:

Table_1

| ID    | Team Assigned | Date Team Start | Date Team End | Service Date | Teams Involved
_________________________________________________________________________|
| 23    |  Red          |   2022-09-01    |  2022-09-29   | 2022-08-15   | Blue, Black, Purple
| 23    |  Blue         |   2022-08-01    |  2022-09-15   | 2022-09-12   | Red, Blue, Purple
| 23    |  Green        |   2022-09-27    |  2022-09-30   | 2022-09-29   | Red, Green, Purple
| 23    |  Black        |   2022-08-01    |  2022-08-17   | 2022-08-01   | Blue, Black 
| 23    |  Purple       |   2022-08-15    |  2022-09-30   | 2022-08-17   | Blue, Black, Purple
| 07    |  Blue         |   2022-07-03    |  2022-09-30   | 2022-07-05   | Blue, Red
| 07    |  Red          |   2022-07-03    |  2022-07-05   | 2022-08-01   | Blue
| 07    |  Purple       |   2022-05-01    |  2022-06-24   | 2022-05-15   | Purple

EDIT:

So I think I found an (albeit unorthodox) solution that appears to work for me thus for. I used the code chunk above with some slight edits:

for id in df['ID']:
   a = df[(df['Service Date'] <= df['Date Team End']) &
          (df['Service Date'] >= df['Date Team Start'])
   a = pd.DataFrame(a.groupby(['case no', 'Service Date'])['Team 
   Assigned'].apply(lambda x: 
   list(np.unique(x)))).reset_index().rename(columns = {'Team 
   Assigned':'List of Teams'}

Then merged this dataframe with my old one:

df2 = pd.merge(df, a, on = ['ID', 'Service Date'], how = 'left)

It sure does feel messy, and I need to verify that everything is working correctly, but this at first glance seems to work.

CodePudding user response：

Considering that OP's dataframe is as follows

df = pd.DataFrame({'ID': [23, 23, 23, 23, 23, 7, 7, 7],
                     'Team Assigned': ['Red', 'Blue', 'Green', 'Black', 'Purple', 'Blue', 'Red', 'Purple'],
                        'Date Team Start': ['2022-09-01', '2022-08-01', '2022-09-27', '2022-08-01', '2022-08-15', '2022-07-03', '2022-07-03', '2022-05-01'],
                        'Date Team End': ['2022-09-29', '2022-09-15', '2022-09-30', '2022-08-17', '2022-09-30', '2022-09-30', '2022-07-05', '2022-06-24'],
                        'Service Date': ['2022-08-15', '2022-09-12', '2022-09-29', '2022-05-04', '2022-08-17', '2022-07-05', '2022-01-01', '2022-05-15']})

# Convert date columns to datetime
df['Date Team Start'] = pd.to_datetime(df['Date Team Start'])
df['Date Team End'] = pd.to_datetime(df['Date Team End'])
df['Service Date'] = pd.to_datetime(df['Service Date'])

[Out]:
   ID Team Assigned Date Team Start Date Team End Service Date
0  23           Red      2022-09-01    2022-09-29   2022-08-15
1  23          Blue      2022-08-01    2022-09-15   2022-09-12
2  23         Green      2022-09-27    2022-09-30   2022-09-29
3  23         Black      2022-08-01    2022-08-17   2022-05-04
4  23        Purple      2022-08-15    2022-09-30   2022-08-17
5   7          Blue      2022-07-03    2022-09-30   2022-07-05
6   7           Red      2022-07-03    2022-07-05   2022-01-01
7   7        Purple      2022-05-01    2022-06-24   2022-05-15

Considering that OP wants to

pull teams where the ID's service Date is within the team start / end date. From there I want to paste a comma separated list into the column where the ID and service date exist.

If I understood correctly, one can write a custom lambda function, such as

df['Teams Involved'] = df.apply(lambda x: ', '.join(df[(df['ID'] == x['ID']) & (df['Service Date'] >= x['Date Team Start']) & (df['Service Date'] <= x['Date Team End'])]['Team Assigned'].unique()), axis=1)

[Out]:
   ID Team Assigned  ... Service Date            Teams Involved
0  23           Red  ...   2022-08-15               Blue, Green
1  23          Blue  ...   2022-09-12         Red, Blue, Purple
2  23         Green  ...   2022-09-29                     Green
3  23         Black  ...   2022-05-04               Red, Purple
4  23        Purple  ...   2022-08-17  Red, Blue, Green, Purple
5   7          Blue  ...   2022-07-05                      Blue
6   7           Red  ...   2022-01-01                      Blue
7   7        Purple  ...   2022-05-15                    Purple

Notes:

df['Teams Involved'] is a way to create a new column
', '.join means that one will be joining the list of teams with a comma and space between each team.
df[(df['ID'] == x['ID']) means that the ID is the same as the ID in the row.
(df['Service Date'] >= x['Date Team Start']) indicates that the Service Date is greater than or equal to the Date Team Start.
(df['Service Date'] <= x['Date Team End']) means that the Service Date is less than or equal to the Date Team End.
df[(df['ID'] == x['ID']) & (df['Service Date'] >= x['Date Team Start']) & (df['Service Date'] <= x['Date Team End'])] indicates that the dataframe df is filtered to only include the conditions that we just mentioned in the previous points.
df[...]['Team Assigned'].unique() is a way to return the unique Team Assigned that satisfy the conditions indicated before.
axis=1 is to indicate that the function will be applied to each row.
Even though the output in is not equal to the one in the question, in both it is relatively similar to it. And, as OP is "aiming for something similar", I believe that this satisfies OP's requirement.