Compare the values in two columns and extract the values of a third column in a dataframe
df =
Location | teams | goals |
---|---|---|
1 | A | 5 |
1 | B | 6 |
2 | A | 7 |
2 | B | 5 |
2 | C | 6 |
3 | B | 7 |
Expected output df =
Location | A Team | A Team goals | B Team | B Team Goals | C Team | C Team goals |
---|---|---|---|---|---|---|
1 | 1 | 5 | 1 | 6 | 0 | NA |
2 | 1 | 7 | 1 | 5 | 1 | 6 |
3 | 0 | NA | 1 | 7 | 0 | NA |
CodePudding user response:
Reshape the dataframe using pivot to get the goals. Check for the non null values in goals
to identify the teams
then join
to get the result
goals = df.pivot(*df.columns)
teams = s.notna().astype(int)
teams.add_suffix(' Team').join(goals.add_suffix(' Team Goals'))
Result
teams A Team B Team C Team A Team Goals B Team Goals C Team Goals
Location
1 1 1 0 5.0 6.0 NaN
2 1 1 1 7.0 5.0 6.0
3 0 1 0 NaN 7.0 NaN
CodePudding user response:
To compare the values in two columns of a dataframe and extract the values of a third column, you can use the pandas library in Python.
First, you can use the pivot_table method to create a new dataframe that has the values of the teams column as columns, the values of the Location column as the index, and the values of the goals column as the values. This will create a dataframe that looks like this:
A B C
Location
1 5 6 NaN
2 7 5 6
3 NaN 7 NaN
Next, you can use the fillna method to replace the NaN values with 0. This will create a dataframe that looks like this:
A B C
Location
1 5 6 0
2 7 5 6
3 0 7 0
Finally, you can use the reset_index method to reset the index of the dataframe and then rename the columns to create the final output dataframe. Here is an example of how you can do this:
import pandas as pd
# Create the original dataframe
df = pd.DataFrame({'Location': [1, 1, 2, 2, 2, 3],
'teams': ['A', 'B', 'A', 'B', 'C', 'B'],
'goals': [5, 6, 7, 5, 6, 7]})
# Create a pivot table with the values of the teams column as columns,
# the values of the Location column as the index, and the values of
# the goals column as the values
pivot_df = df.pivot_table(index='Location', columns='teams', values='goals')
# Fill the NaN values with 0
pivot_df = pivot_df.fillna(0)
# Reset the index of the pivot table
pivot_df = pivot_df.reset_index()
# Rename the columns of the pivot table to create the final output dataframe
output_df = pivot_df.rename(columns={'Location': 'Location',
'A': 'A Team',
'B': 'B Team',
'C': 'C Team'})
# Print the output dataframe
print(output_df)
This will create an output dataframe that looks like this:
Location A Team B Team C Team
0 1 5 6 0
1 2 7 5 6
2 3 0 7 0
I hope this helps! Let me know if you have any other questions.
CodePudding user response:
Example Code
data = {'Location': {0: 1, 1: 1, 2: 2, 3: 2, 4: 2, 5: 3},
'teams': {0: 'A', 1: 'B', 2: 'A', 3: 'B', 4: 'C', 5: 'B'},
'goals': {0: 5, 1: 6, 2: 7, 3: 5, 4: 6, 5: 7}}
df = pd.DataFrame(data)
First
aggregation with groupby
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1))
output:
teams A B C
count sum count sum count sum
Location
1 1.0 5.0 1.0 6.0 NaN NaN
2 1.0 7.0 1.0 5.0 1.0 6.0
3 NaN NaN 1.0 7.0 NaN NaN
Second
Let's create idx
to change columns
idx = pd.MultiIndex.from_product([df['teams'].unique(), ['Team', 'Team Goal']]).map(lambda x: ' '.join(x))
idx
Index(['A Team', 'A Team Goal', 'B Team', 'B Team Goal', 'C Team', 'C Team Goal'], dtype='object')
Last
change columns and reset_index
(include First code)
(df.groupby(['Location', 'teams'])['goals'].agg(['count', sum])
.unstack().swaplevel(0, 1, axis=1).sort_index(axis=1)
.set_axis(idx, axis=1).reset_index())
output
Location A Team A Team Goal B Team B Team Goal C Team C Team Goal
0 1 1.0 5.0 1.0 6.0 NaN NaN
1 2 1.0 7.0 1.0 5.0 1.0 6.0
2 3 NaN NaN 1.0 7.0 NaN NaN