Given DataFrames:
a = pd.DataFrame({"Question Skill": ["Algebra", "Patterns"],
"Average": [56,76],
"SD": [45,30]})
b = pd.DataFrame({"Question No.": [1, 2, 3, 4, 5],
"Question Skill": ['Algebra', 'Patterns', 'Algebra', 'Patterns', 'Patterns']
Below is the required output:
c = pd.DataFrame({"Question Skill": ["Algebra","Patterns"],
"Question No.": [[1,3],[2,4,5]],
"Average": [56,76],
"SD": [45,30]})
CodePudding user response:
You probably wanna create a function if you want to do more question skill categories, but for just 2 categories, this should be sufficient
algebra=[b['Question No.'][i] for i in range(len(b)) if b['Question Skill'][i]=='Algebra']
patterns=[b['Question No.'][i] for i in range(len(b)) if b['Question Skill'][i]=='Patterns']
a['Question No.']=[algebra, patterns]
CodePudding user response:
You can get it done in SQL style with some additional processing mixed in.
import pandas as pd
a = pd.DataFrame(
{"Question Skill": ["Algebra", "Patterns"], "Average": [56, 76], "SD": [45, 30]}
)
b = pd.DataFrame(
{
"Question No.": [1, 2, 3, 4, 5],
"Question Skill": ["Algebra", "Patterns", "Algebra", "Patterns", "Patterns"],
}
)
c = (
pd.merge(a, b, how="left", on=["Question Skill"])
.groupby("Question Skill")["Question No.", "Average", "SD",]
.agg(lambda x: list(set(x))[0] if len(list(set(x))) == 1 else list(set(x)))
.reset_index()
)
print(c)
Output:
Question Skill Question No. Average SD
0 Algebra [1, 3] 56 45
1 Patterns [2, 4, 5] 76 30
CodePudding user response:
You can aggregate Question No.
column in b
first, then map the result to a
a['Question No.'] = a['Question Skill'].map(b.groupby("Question Skill")["Question No."].apply(list))
print(a)
Question Skill Average SD Question No.
0 Algebra 56 45 [1, 3]
1 Patterns 76 30 [2, 4, 5]