I have a dataframe with 3 columns (x, y and data), I need to combine every two rows and end up with two data columns. Essentially I need to combine the rows with the same x and y coordinates into one whilst keeping the two data variables separate creating 4 columns (x, y, u and v)
Original data example:
x y data
-----------------------
1 1 0.2
1 1 0.5
1 2 0.7
1 2 0.2
...
Expected output:
x y u v
----------------------------
1 1 0.2 0.5
1 2 0.7 0.2
...
CodePudding user response:
if sorting your dataframe by coordinates is not an issue then you can probably try:
df = pd.DataFrame(dict(
x=[1,1,1,1],
y=[1,1,2,2],
data=[0.2,0.5,0.7,0.2]
)).sort_values(by=["x", "y"])
df["label"] = ["u", "v"] * int(len(df)/2)
df = df.pivot(index=["x", "y"], columns="label", values="data").reset_index()
print(df)
x | y | u | v |
---|---|---|---|
1 | 1 | 0.2 | 0.5 |
1 | 2 | 0.7 | 0.2 |
CodePudding user response:
Group by x
and y
, convert the groups to lists, and then create a dataframe from those lists, and append it to the original dataframe:
tmp = df.groupby(['x','y'])['data'].apply(list).reset_index()
tmp = pd.concat([x.drop('data', axis=1), pd.DataFrame(x['data'].tolist(), columns=['u', 'v'])], axis=1)
Output:
>>> tmp
x y u v
0 1 1 0.2 0.5
1 1 2 0.7 0.2