I have a dataframe with two columns, one is 'response' with numerical data, the other one is 'treatment' with binary categorical data 'water' or 'beer'.
how do I split the dataframe into two data series, one is 'response' with 'water' treatment and the other is 'response' with 'beer' treatment?
Thanks,
CodePudding user response:
Try something like this:
data = {
'response': [1, 2, 3, 4, 5],
'treatment': ['water', 'beer', 'beer', 'water', 'beer']
}
df = pd.DataFrame(data)
data2 = {
'beer': [(df['treatment'][i], df['response'][i]) if df['treatment'][i] == 'beer' else None for i in range(len(df))],
'water': [(df['treatment'][i], df['response'][i]) if df['treatment'][i] == 'water' else None for i in range(len(df))]
}
df2 = pd.DataFrame(data2)
print(df2)
Result:
❯ python test.py
beer water
0 None (water, 1)
1 (beer, 2) None
2 (beer, 3) None
3 None (water, 4)
4 (beer, 5) None
CodePudding user response:
Use pandas.DataFrame.groupby
with a list comprehension to return the two dataframes.
import pandas as pd
from io import StringIO
s = """response,treatment
1,water
82,beer
425,water
111,water
933,beer
66,beer
"""
df = pd.read_csv(StringIO(s))
out = [d for _, d in df.groupby('treatment')]
# Output :
print(out)
[ response treatment
1 82 beer
4 933 beer
5 66 beer,
response treatment
0 1 water
2 425 water
3 111 water]
To access one of the dataframes, use out[0]
or out[1]
.