Here is a dataframe data_1
.
data_1=pd.DataFrame({'id':['1','1','1','1','1','2','2','2','2','2'],
'date':['20220325','20220325','20220325','20220327','20220327','20220705','20220705','20220706','20220706','20220706'],
'base':["wt","bmi","p","wt","wt","bmi","bmi","wt","p","bmi"],
'value':['75','21','25','76','77','19','18','85','23','19']},
)
data_1['id'] = pd.to_numeric(data_1['id'], errors='coerce')
data_1['date'] = pd.to_numeric(data_1['date'], errors='coerce')
data_1['value'] = pd.to_numeric(data_1['value'], errors='coerce')
I want to make this data_1
as follows:
data_1=pd.DataFrame({'id':[1,1,1,2,2,2],
'date':[20220325,20220327,20220327,20220705,20220705,20220706],
'wt':[75,76,77,"","",85],
'bmi':[21,"","",19,18,19],
'p':[25,"","","","",23]})
I tried pivot_table
,but the output is not the same as I expected.
Moreover, I need to save the data_1
as csv file, but the there are no columns id
and date
in the csv file that I made.
Is there any method to change the data_1
as my expected output?
CodePudding user response:
Because you want the duplicates still in your pivot table, this was an interesting question. My solution is to make a pivot table for the non-duplicates and the duplicates and then concatenate them together.
tf = data_1[['id', 'date', 'base']].duplicated()
df1 = data_1[~tf]
df2 = data_1[tf]
df1 = pd.pivot_table(df1, values='value', index=['id', 'date'], columns='base').reset_index()
df2 = pd.pivot_table(df2, values='value', index=['id', 'date'], columns='base').reset_index()
data_1 = pd.concat([df1, df2]).fillna('')
Output:
>>> data_1
base id date bmi p wt
0 1 20220325 21.0 25.0 75.0
1 1 20220327 76.0
2 2 20220705 19.0
3 2 20220706 19.0 23.0 85.0
0 1 20220327 77.0
1 2 20220705 18.0
Then to a csv file use data_1.to_csv(file_path, index=False)
CodePudding user response:
This is a variation on a pivot
:
(data_1.assign(id2=data_1.groupby(['id', 'date', 'base']).cumcount())
.pivot(index=['id', 'id2', 'date'], columns='base', values='value')
.convert_dtypes().astype(str).replace('<NA>', '')
[data_1['base'].unique()]
.droplevel(1)
.to_csv('/tmp/output.csv')
)
output csv:
id,date,wt,bmi,p
1,20220325,75,21,25
1,20220327,76,,
1,20220327,77,,
2,20220705,,19,
2,20220706,85,19,23
2,20220705,,18,