Home > Net >  How to splitting a data frame with categorical data into two by the different categories in Python
How to splitting a data frame with categorical data into two by the different categories in Python

Time:09-27

I have a dataframe with two columns, one is 'response' with numerical data, the other one is 'treatment' with binary categorical data 'water' or 'beer'.

how do I split the dataframe into two data series, one is 'response' with 'water' treatment and the other is 'response' with 'beer' treatment?

Thanks,

CodePudding user response:

Try something like this:

data = {
    'response': [1, 2, 3, 4, 5],
    'treatment': ['water', 'beer', 'beer', 'water', 'beer']
}
df = pd.DataFrame(data)

data2 = {
    'beer': [(df['treatment'][i], df['response'][i]) if df['treatment'][i] == 'beer' else None for i in range(len(df))],
    'water': [(df['treatment'][i], df['response'][i]) if df['treatment'][i] == 'water' else None for i in range(len(df))]
}
df2 = pd.DataFrame(data2)

print(df2)

Result:

❯ python test.py
        beer       water
0       None  (water, 1)
1  (beer, 2)        None
2  (beer, 3)        None
3       None  (water, 4)
4  (beer, 5)        None

CodePudding user response:

Use pandas.DataFrame.groupby with a list comprehension to return the two dataframes.

import pandas as pd
from io import StringIO

s = """response,treatment
1,water
82,beer
425,water
111,water
933,beer
66,beer
"""

df = pd.read_csv(StringIO(s))

out = [d for _, d in df.groupby('treatment')]

# Output :

print(out)
[   response treatment
 1        82      beer
 4       933      beer
 5        66      beer,
    response treatment
 0         1     water
 2       425     water
 3       111     water]

To access one of the dataframes, use out[0] or out[1].

  • Related