I want to add new column to see exam differences in a percentage value.
import pandas as pd
exam_1 = {
'Name': ['Jonn', 'Tomas', 'Fran', 'Olga', 'Veronika', 'Stephan'],
'Mat': [85, 75, 50, 93, 88, 90],
'Science': [96, 97, 99, 87, 90, 88],
'Reading': [80, 60, 72, 86, 84, 77],
'Wiritng': [78, 82, 88, 78, 86, 82],
'Lang': [77, 79, 77, 72, 90, 92],
}
exam_2 = {
'Name': ['Jonn', 'Tomas', 'Fran', 'Olga', 'Veronika', 'Stephan'],
'Mat': [80, 80, 90, 90, 85, 80],
'Science': [50, 60, 85, 90, 66, 82],
'Reading': [60, 75, 55, 90, 85, 60],
'Wiritng': [56, 66, 90, 82, 60, 80],
'Lang': [80, 78, 76, 90, 77, 66],
}
df_1 = pd.DataFrame(exam_1)
df_2 = pd.DataFrame(exam_2)
#cmp = pd.merge(df_1, df_2, how="outer", on=["Name"], suffixes=("_1", "_2"))
cmp = pd.merge(
df_1, df_2, how="outer", on=["Name"],
suffixes=("_1", "_2")).set_index("Name").sort_index(axis=1).reset_index()
print(cmp)
The output of the above code is like below;
Name Lang_1 Lang_2 Mat_1 Mat_2 Reading_1 Reading_2 Science_1 Science_2 Wiritng_1 Wiritng_2
0 Jonn 77 80 85 80 80 60 96 50 78 56
1 Tomas 79 78 75 80 60 75 97 60 82 66
2 Fran 77 76 50 90 72 55 99 85 88 90
3 Olga 72 90 93 90 86 90 87 90 78 82
4 Veronika 90 77 88 85 84 85 90 66 86 60
5 Stephan 92 66 90 80 77 60 88 82 82 80
What I want is that, add new column after compared value, is there any built-in function for that one. Because constant section like Name can be change, maybe 3 column can be constant in the future. I want to use built-in function to use reusability.
I try to use it manually but it is not reusable.
What I want exactly in below;
Name Lang_1 Lang_2 Lang_Res Mat_1 Mat_2 Mat_Res Reading_1 Reading_2 Reading_Res Science_1 Science_2 Science_Res Writing_1 Writing_2 Writing_Res
0 Jonn 77 80 Lang_data 85 80 Mat_data 80 60 Reading_data 96 50 Science_data 78 56 Writing_data
1 Tomas 79 78 Lang_data 75 80 Mat_data 60 75 Reading_data 97 60 Science_data 82 66 Writing_data
2 Fran 77 76 Lang_data 50 90 Mat_data 72 55 Reading_data 99 85 Science_data 88 90 Writing_data
3 Olga 72 90 Lang_data 93 90 Mat_data 86 90 Reading_data 87 90 Science_data 78 82 Writing_data
4 Veronika 90 77 Lang_data 88 85 Mat_data 84 85 Reading_data 90 66 Science_data 86 60 Writing_data
5 Stephan 92 66 Lang_data 90 80 Mat_data 77 60 Reading_data 88 82 Science_data 82 80 Writing_data
CodePudding user response:
You can start by making a list with every column having the suffixe _2
and then use pandas.DataFrame.insert
with pandas.Index.get_loc
on a list comprehension to insert the result columns where they should.
Try this :
edge_cols= cmp.columns.str.extractall("(\w _2)")[0].tolist()
[cmp.insert(cmp.columns.get_loc(col) 1, col.split("_")[0] "_Res", col.split("_")[0] "_Data") for col in edge_cols]
# Output :
print(cmp.to_string())
Name Lang_1 Lang_2 Lang_Res Mat_1 Mat_2 Mat_Res Reading_1 Reading_2 Reading_Res Science_1 Science_2 Science_Res Wiritng_1 Wiritng_2 Wiritng_Res
0 Jonn 77 80 Lang_Data 85 80 Mat_Data 80 60 Reading_Data 96 50 Science_Data 78 56 Wiritng_Data
1 Tomas 79 78 Lang_Data 75 80 Mat_Data 60 75 Reading_Data 97 60 Science_Data 82 66 Wiritng_Data
2 Fran 77 76 Lang_Data 50 90 Mat_Data 72 55 Reading_Data 99 85 Science_Data 88 90 Wiritng_Data
3 Olga 72 90 Lang_Data 93 90 Mat_Data 86 90 Reading_Data 87 90 Science_Data 78 82 Wiritng_Data
4 Veronika 90 77 Lang_Data 88 85 Mat_Data 84 85 Reading_Data 90 66 Science_Data 86 60 Wiritng_Data
5 Stephan 92 66 Lang_Data 90 80 Mat_Data 77 60 Reading_Data 88 82 Science_Data 82 80 Wiritng_Data
CodePudding user response:
If I understand correctly, you're hoping to compute a column from two other columns that are related.
What I suggest is this.
- Keep your basic column prefixes in a list.
prefixes = ['Lang', 'Mat', 'Reading', ...]
- Use these prefixes to automate the lookup and calculation on each column. Let's say we want to store the average of items
_1
and_2
for every prefix.
for prefix in prefixes:
column1 = df[f"{prefix}_1"]
column2 = df[f"{prefix}_2"]
averaged = (column1 column2) / 2
df.loc[:, f"{prefix}_average"] = averaged
This will add an average column for every category you have a prefix for.