I have a dataframe cut from a much larger dataframe:
import pandas as pd
data = {'Name': [5, 5, 6, 6, 7, 7],
'Value': [1, 2, 1, 2, 1, 2]
}
df = pd.DataFrame(data)
Name Value
0 5 1
1 5 2
2 6 1
3 6 2
4 7 1
5 7 2
Ideal Output:
Name Value Value2
0 5 1 2
1 6 1 2
2 7 1 2
I need a way to split the dataframe into 2 separate dataframes based on the "Value" column. The rows with '1' in the Value column and the rows with '2' in the Value column need to be split up.
The best/end goal solution is to have one Name with the 1 and 2 be separate columns in the same dataframe. My idea so far is to split the two and combine them so the data is side by side all tracing back to a single Name.
CodePudding user response:
You can group by your Name column and aggregate your values into a list
.
out = df.groupby('Name').agg(list).reset_index()
- Use a
DataFrame
constructor to break the value and assign them back:
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)
>>> out.drop('Value',axis=1)
Name Value1 Value2
0 5 1 2
1 6 1 2
2 7 1 2
- Use
pd.concat
:
>>> pd.concat([out['Name'],
pd.DataFrame(out["Value"].to_list(), columns=['Value1', 'Value2'])],
axis=1)
Name Value1 Value2
0 5 1 2
1 6 1 2
2 7 1 2
Complete code to answer your comments:
Sample DF
data = {'Name': [5, 5, 6, 6, 7, 7],
'Value': [1, 2, 1, 2, 1, 2]
}
df = pd.DataFrame(data)
Name Value
0 5 1
1 5 2
2 6 1
3 6 2
4 7 1
5 7 2
Answer:
out = df.groupby('Name').agg(list).reset_index()
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)
out.drop('Value',axis=1)
Name Value1 Value2
0 5 1 2
1 6 1 2
2 7 1 2
Additional columns:
data = {'Name': [5, 5, 6, 6, 7, 7],
'Value': [1, 2, 1, 2, 1, 2],
'extra':[1,2,3,4,5,6]
}
df = pd.DataFrame(data)
out = df.groupby('Name').agg({'Value':list}).reset_index()
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)
out.drop('Value',axis=1,inplace=True)
result = pd.merge(df.drop('Value',axis=1),out,on='Name',how='left')
>>>result
Name extra Value1 Value2
0 5 1 1 2
1 5 2 1 2
2 6 3 1 2
3 6 4 1 2
4 7 5 1 2
5 7 6 1 2
CodePudding user response:
Use pandas.DataFrame.groupby()
:
>>> df
Name Value
0 5 1
1 5 2
2 6 1
3 6 2
4 7 1
5 7 2
>>> dfs = [d for _, d in df.groupby('Value')]
>>> dfs
[ Name Value
0 5 1
2 6 1
4 7 1,
Name Value
1 5 2
3 6 2
5 7 2]
>>> dfs[0]
Name Value
0 5 1
2 6 1
4 7 1
>>> dfs[1]
Name Value
1 5 2
3 6 2
5 7 2