Home > Net >  Splitting a Dataframe not based on a string, but a value in a column
Splitting a Dataframe not based on a string, but a value in a column

Time:12-11

I have a dataframe cut from a much larger dataframe:

import pandas as pd


data = {'Name': [5, 5, 6, 6, 7, 7],
        'Value': [1, 2, 1, 2, 1, 2]
            }

df = pd.DataFrame(data)

    Name    Value
0     5       1
1     5       2
2     6       1
3     6       2
4     7       1
5     7       2

Ideal Output:
    Name    Value  Value2
0     5       1      2
1     6       1      2
2     7       1      2

I need a way to split the dataframe into 2 separate dataframes based on the "Value" column. The rows with '1' in the Value column and the rows with '2' in the Value column need to be split up.

The best/end goal solution is to have one Name with the 1 and 2 be separate columns in the same dataframe. My idea so far is to split the two and combine them so the data is side by side all tracing back to a single Name.

CodePudding user response:

You can group by your Name column and aggregate your values into a list.

out = df.groupby('Name').agg(list).reset_index()
  1. Use a DataFrame constructor to break the value and assign them back:
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)

>>> out.drop('Value',axis=1)

   Name  Value1  Value2
0     5       1       2
1     6       1       2
2     7       1       2
  1. Use pd.concat:
>>> pd.concat([out['Name'],
              pd.DataFrame(out["Value"].to_list(), columns=['Value1', 'Value2'])],
              axis=1)

   Name  Value1  Value2
0     5       1       2
1     6       1       2
2     7       1       2

Complete code to answer your comments:

Sample DF

data = {'Name': [5, 5, 6, 6, 7, 7],
        'Value': [1, 2, 1, 2, 1, 2]
            }
df = pd.DataFrame(data)

   Name  Value
0     5      1
1     5      2
2     6      1
3     6      2
4     7      1
5     7      2

Answer:

out = df.groupby('Name').agg(list).reset_index()
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)
out.drop('Value',axis=1)

   Name  Value1  Value2
0     5       1       2
1     6       1       2
2     7       1       2

Additional columns:

data = {'Name': [5, 5, 6, 6, 7, 7],
        'Value': [1, 2, 1, 2, 1, 2],
        'extra':[1,2,3,4,5,6]
            }
df = pd.DataFrame(data)
out = df.groupby('Name').agg({'Value':list}).reset_index()
out[['Value1','Value2']] = pd.DataFrame(out.Value.tolist(), index= out.index)
out.drop('Value',axis=1,inplace=True)

result = pd.merge(df.drop('Value',axis=1),out,on='Name',how='left')

>>>result
 
   Name  extra  Value1  Value2
0     5      1       1       2
1     5      2       1       2
2     6      3       1       2
3     6      4       1       2
4     7      5       1       2
5     7      6       1       2

CodePudding user response:

Use pandas.DataFrame.groupby():

>>> df
   Name  Value
0     5      1
1     5      2
2     6      1
3     6      2
4     7      1
5     7      2

>>> dfs = [d for _, d in df.groupby('Value')]

>>> dfs
[   Name  Value
 0     5      1
 2     6      1
 4     7      1,
    Name  Value
 1     5      2
 3     6      2
 5     7      2]

>>> dfs[0]
   Name  Value
0     5      1
2     6      1
4     7      1

>>> dfs[1]
   Name  Value
1     5      2
3     6      2
5     7      2
  • Related