Similar to pivot table in Python-CodePudding

Here is a dataframe data_1.

data_1=pd.DataFrame({'id':['1','1','1','1','1','2','2','2','2','2'],
                 'date':['20220325','20220325','20220325','20220327','20220327','20220705','20220705','20220706','20220706','20220706'],
                 'base':["wt","bmi","p","wt","wt","bmi","bmi","wt","p","bmi"],
                 'value':['75','21','25','76','77','19','18','85','23','19']},
                   )
data_1['id'] = pd.to_numeric(data_1['id'], errors='coerce')
data_1['date'] = pd.to_numeric(data_1['date'], errors='coerce')
data_1['value'] = pd.to_numeric(data_1['value'], errors='coerce')

I want to make this data_1 as follows:

data_1=pd.DataFrame({'id':[1,1,1,2,2,2],
                 'date':[20220325,20220327,20220327,20220705,20220705,20220706],
                 'wt':[75,76,77,"","",85],
                 'bmi':[21,"","",19,18,19],
                 'p':[25,"","","","",23]})

I tried pivot_table ,but the output is not the same as I expected. Moreover, I need to save the data_1 as csv file, but the there are no columns id and date in the csv file that I made. Is there any method to change the data_1 as my expected output?

CodePudding user response：

Because you want the duplicates still in your pivot table, this was an interesting question. My solution is to make a pivot table for the non-duplicates and the duplicates and then concatenate them together.

tf = data_1[['id', 'date', 'base']].duplicated()

df1 = data_1[~tf]
df2 = data_1[tf]

df1 = pd.pivot_table(df1, values='value', index=['id', 'date'], columns='base').reset_index()
df2 = pd.pivot_table(df2, values='value', index=['id', 'date'], columns='base').reset_index()

data_1 = pd.concat([df1, df2]).fillna('')

Output:

>>> data_1
base  id      date   bmi     p    wt
0      1  20220325  21.0  25.0  75.0
1      1  20220327              76.0
2      2  20220705  19.0            
3      2  20220706  19.0  23.0  85.0
0      1  20220327              77.0
1      2  20220705  18.0

Then to a csv file use data_1.to_csv(file_path, index=False)

CodePudding user response：

This is a variation on a pivot:

(data_1.assign(id2=data_1.groupby(['id', 'date', 'base']).cumcount())
       .pivot(index=['id', 'id2', 'date'], columns='base', values='value')
       .convert_dtypes().astype(str).replace('<NA>', '')
       [data_1['base'].unique()]
       .droplevel(1)
       .to_csv('/tmp/output.csv')
)

output csv:

id,date,wt,bmi,p
1,20220325,75,21,25
1,20220327,76,,
1,20220327,77,,
2,20220705,,19,
2,20220706,85,19,23
2,20220705,,18,