while trying to pivot a table I get an error I don't understand how to fix.
My code is:
import numpy as np
import pandas as pd
df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
df_com = df1.groupby(['CommentOwner','DiscussionId'])
y=df_com.nunique()
y=y.reset_index()
p=y.pivot(index="CommentOwner", columns="DiscussionId", values=['CommentOwner','DiscussionId','CommentCreation_min','CommentCreation_max','CommentCreation_count','AnswerId']).fillna(0)
I used reset_index() so I can use the columns 'CommentOwner','DiscussionId' after they were removed during the group by.
when I run this code I get this mistake:
TypeError: value should be a 'Timedelta', 'NaT', or array of those. Got 'int' instead.
when I try this code it does works:
import numpy as np
import pandas as pd
df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
df_com = df1.groupby(['CommentOwner','DiscussionId'])
y=df_com.nunique()
y.to_csv(r'C:\Users\Documents\Python\y.csv')
y_x=pd.read_csv(r'C:\Users\Documents\Python\y.csv')
p=y_x.pivot(index="CommentOwner", columns="DiscussionId", values=['CommentOwner','DiscussionId','CommentCreation_min','CommentCreation_max','CommentCreation_count','AnswerId']).fillna(0)
The code worked when I didn't use reset_index(), but rather saved the data frame as csv and then read it again.
I hope my question is clear. Any idea why this happens?
There must be a nicer way to do it without saving the output and reuploading it.
Thanks!
CodePudding user response:
The problem is you fill null values with 0 for all columns even for datetime64
columns.
You should do something like:
p = (y.pivot(...)
.fillna({my_datetime_col1: pd.NaT, my_datetime_col2: pd.NaT})
.fillna(0))
The first fillna
replace null values for all DatetimeIndex
columns then the second one replace other missing values.
CodePudding user response:
IIUC use:
df1=pd.read_csv(r'C:\Users\Documents\Python\Data.csv')
p = df1.groupby(['CommentOwner','DiscussionId']).nunique().unstack(fill_value=0)
Btw, your solution should working if remove 'CommentOwner','DiscussionId'
from parameter value
like:
p=y.pivot(index="CommentOwner",
columns="DiscussionId",
values=['CommentCreation_min','CommentCreation_max',
'CommentCreation_count','AnswerId']).fillna(0)