I have a pandas DataFrame with 10 columns, some of which are dtypes of object. df
looks something like this:
date ID Cat Chan Num
2021-04-02 abc Chrome Direct 8
2021-04-02 xyz Safari Search 5
2021-04-02 pqr Firefox Direct 6
2021-04-02 dcf Chrome NA 1
2021-04-03 abc Safari Search 3
2021-04-03 xyz Safari Search 4
2021-04-03 pqr NA Direct 7
2021-04-03 dcf Chrome Direct 5
what I need is to write a loop that first tracks only object dtypes in this df and and then create a new DataFrame for each column. These new df
s should have columns heads for the values in the original df
and rows to return count of each value per date. Something like this:
first new df: df_cat:
date Chrome Safari Firefox NA
2021-04-02 2 1 1 0
2021-04-03 1 2 0 1
Second new df: df_chan:
date Direct Search NA
2021-04-02 2 1 1
2021-04-03 2 2 0
Since column "Num" in original is int dtypes, I don't not want a new df
for this column.
I tried something like the below but it gives an error:
AttributeError: 'DataFrame' object has no attribute 'item':
mylist = list(df.select_dtypes(include=['object']).columns)
for item in mylist:
category_item = pd.get_dummies(df.item)
category_item.head()
Could someone please help me? I am not that good with Pandas, and it sounds like a big one for me to fix.
CodePudding user response:
You could set_index
with "date"; then use get_dummies
groupby
sum
on each of the object dtype columns to get the desired outcome.
df0 = df0.set_index('date').fillna('NA')
out = {col: pd.get_dummies(df0[col]).groupby(level=0).sum() for col in df0.select_dtypes(include=[object]).columns}
Output:
ID:
abc dcf pqr xyz
date
2021-04-02 1 1 1 1
2021-04-03 1 1 1 1
Cat:
Chrome Firefox NA Safari
date
2021-04-02 2 1 0 1
2021-04-03 1 0 1 2
Chan:
Direct NA Search
date
2021-04-02 2 1 1
2021-04-03 2 0 2
I used a dictionary here to map column names to the new DataFrames. You could store them in a list too.