I have a data frame df that has the following information:
| Project | class | x |y | z |
| --- | --- | --- | --- | --- |
| Project_A | c.java |a1 | a2 | |
| Project_A | c.java | | | a3 |
| Project_b | t.java |b1 |b2 | |
I need to combine these lines if the values of project and Class are equal among them.
Based on the previous example, the aspected output in this case have be:
| Project | class | x |y | z |
| --- | --- | --- | --- | --- |
| Project_A | c.java |a1 |a2 | a3 |
| Project_b | t.java |b1 |b2 | |
it is also important to note that the way the dataset was constructed, there is no risk of rewriting values; so, in other words, you will never have such a situation:
| Project | class | x |y | z |
| --- | --- | --- | ---| --- |
| Project_A | c.java |a1 | a2 | |
| Project_A | c.java |a_x | a_y|a_z |
| Project_b | t.java |b1 |b2 | |
How can do it?
CodePudding user response:
This will groupby the project and class and then find the first value of the grouping. I'm not sure if your data will allow for something like this since if there is another example of data in the column for the project/class combination it might mess up some of your data
df.groupby(['Project', 'class'], as_index=False).agg('first')
CodePudding user response:
You can use groupby()
on both columns and sum the others:
X = pd.DataFrame({
'Project':["Project_A","Project_A",'Project_b'],
'class':["c.java","c.java","t.java"],
'x':["a1",None,"b1"],
'y':["a2",None,"b2"],
'z':[None,"a3",None]})
Y= X.groupby(['Project','class']).sum()
print(Y)
Output:
x y z
Project class
Project_A c.java a1 a2 a3
Project_b t.java b1 b2 0