Compare the values in two columns and extract the values of a third column in a dataframe-CodePudding

Compare the values in two columns and extract the values of a third column in a dataframe

df =

Location	teams	goals
1	A	5
1	B	6
2	A	7
2	B	5
2	C	6
3	B	7

Expected output df =

Location	A Team	A Team goals	B Team	B Team Goals	C Team	C Team goals
1	1	5	1	6	0	NA
2	1	7	1	5	1	6
3	0	NA	1	7	0	NA

CodePudding user response：

Reshape the dataframe using pivot to get the goals. Check for the non null values in goals to identify the teams then join to get the result

goals = df.pivot(*df.columns)
teams = s.notna().astype(int)

teams.add_suffix(' Team').join(goals.add_suffix(' Team Goals'))

Result

teams     A Team  B Team  C Team  A Team Goals  B Team Goals  C Team Goals
Location                                                                  
1              1       1       0           5.0           6.0           NaN
2              1       1       1           7.0           5.0           6.0
3              0       1       0           NaN           7.0           NaN

CodePudding user response：

To compare the values in two columns of a dataframe and extract the values of a third column, you can use the pandas library in Python.

First, you can use the pivot_table method to create a new dataframe that has the values of the teams column as columns, the values of the Location column as the index, and the values of the goals column as the values. This will create a dataframe that looks like this:

        A    B    C
Location
1       5    6  NaN
2       7    5    6
3     NaN    7  NaN

Next, you can use the fillna method to replace the NaN values with 0. This will create a dataframe that looks like this:

        A    B    C
Location
1       5    6    0
2       7    5    6
3       0    7    0

Finally, you can use the reset_index method to reset the index of the dataframe and then rename the columns to create the final output dataframe. Here is an example of how you can do this:

import pandas as pd

# Create the original dataframe
df = pd.DataFrame({'Location': [1, 1, 2, 2, 2, 3],
                   'teams': ['A', 'B', 'A', 'B', 'C', 'B'],
                   'goals': [5, 6, 7, 5, 6, 7]})

# Create a pivot table with the values of the teams column as columns,
# the values of the Location column as the index, and the values of
# the goals column as the values
pivot_df = df.pivot_table(index='Location', columns='teams', values='goals')

# Fill the NaN values with 0
pivot_df = pivot_df.fillna(0)

# Reset the index of the pivot table
pivot_df = pivot_df.reset_index()

# Rename the columns of the pivot table to create the final output dataframe
output_df = pivot_df.rename(columns={'Location': 'Location',
                                     'A': 'A Team',
                                     'B': 'B Team',
                                     'C': 'C Team'})

# Print the output dataframe
print(output_df)

This will create an output dataframe that looks like this:

   Location  A Team  B Team  C Team
0         1       5       6       0
1         2       7       5       6
2         3       0       7       0

I hope this helps! Let me know if you have any other questions.

CodePudding user response：

Example Code

data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
        'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
        'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)

First

aggregation with groupby

(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
 .unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))

output:

teams   A               B               C
        count   sum     count   sum     count   sum
Location                        
1       1.0     5.0     1.0     6.0     NaN     NaN
2       1.0     7.0     1.0     5.0     1.0     6.0
3       NaN     NaN     1.0     7.0     NaN     NaN

Second

Let's create idx to change columns

idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))

idx

Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')

Last

change columns and reset_index (include First code)

(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
 .unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
 .set_axis(idx, axis=1).reset_index())

output

    Location    A Team  A Team Goal B Team  B Team Goal C Team  C Team Goal
0   1           1.0     5.0         1.0     6.0         NaN     NaN
1   2           1.0     7.0         1.0     5.0         1.0     6.0
2   3           NaN     NaN         1.0     7.0         NaN     NaN