I have a DF with 16 columns, I need to work on 2 of them to generate a table (creating a DF in this case), the first column is Type (has 2 type, Subscriber and one-time user, and the second is day (days of a week), following is the DS (part, the actual is more than 16K rows and 16 columns), I am working on:
Type,Trip_in_min,Day
One-time user,25,Sunday
Subscriber,11,Sunday
Subscriber,4,Sunday
One-time user,11,Sunday
Subscriber,9,Sunday
One-time user,4,Sunday
Subscriber,2,Sunday
One-time user,3,Sunday
Subscriber,2,Sunday
Subscriber,8,Sunday
One-time user,12,Monday
Subscriber,11,Monday
One-time user,2,Monday
Subscriber,2,Monday
One-time user,5,Tuesday
Subscriber,9,Tuesday
One-time user,6,Tuesday
Subscriber,16,Tuesday
I can get the following results, based on the above input
Subscribers One Time Users
Sunday 6 4
Monday 2 2
Tuesday 2 2
Wednesday 0 0
Thursday 0 0
Friday 0 0
Saturday 0 0
I have the following code:
import pandas as pd
file = pd.read_csv(r"D:\OneDrive\DataAnalysis\Test2.txt")
df = pd.DataFrame(file)
user_type = 0
regular = 0
regular1 = 0
regular2 = 0
regular3 = 0
regular4 = 0
regular5 = 0
regular6 = 0
one_time_user = 0
one_time_user1 = 0
one_time_user2 = 0
one_time_user3 = 0
one_time_user4 = 0
one_time_user5 = 0
one_time_user6 = 0
day_idx = 0
new_list = pd.DataFrame({'Subscribers': [0, 0, 0, 0, 0, 0, 0], 'One Time Users': [0, 0, 0, 0, 0, 0, 0]}, index=['Sunday', 'Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday'])
for index1, row in df.iterrows():
for i in range(0, len(df)):
user_type = df['Type'].iloc[i]
day = df['Day'].iloc[i]
if day == 'Sunday':
if user_type == 'Subscriber':
regular = 1
else:
one_time_user = 1
elif day == 'Monday':
if user_type == 'Subscriber':
regular1 = 1
else:
one_time_user1 = 1
elif day == 'Tuesday':
if user_type == 'Subscriber':
regular2 = 1
else:
one_time_user2 = 1
elif day == 'Wednesday':
if user_type == 'Subscriber':
regular3 = 1
else:
one_time_user3 = 1
elif day == 'Thursday':
if user_type == 'Subscriber':
regular4 = 1
else:
one_time_user4 = 1
elif day == 'Friday':
if user_type == 'Subscriber':
regular5 = 1
else:
one_time_user5 = 1
elif day == 'Saturday':
if user_type == 'Subscriber':
regular6 = 1
else:
one_time_user6 = 1
break
#print('Regular:', regular)
#print('One Tine User:', one_time_user)
new_list.at["Sunday", "Subscribers"] = regular
new_list.at['Sunday', "One Time Users"] = one_time_user
new_list.at["Monday", "Subscribers"] = regular1
new_list.at['Monday', "One Time Users"] = one_time_user1
new_list.at["Tuesday", "Subscribers"] = regular2
new_list.at['Tuesday', "One Time Users"] = one_time_user2
new_list.at["Wednesday", "Subscribers"] = regular3
new_list.at['Wednesday', "One Time Users"] = one_time_user3
new_list.at["Thursday", "Subscribers"] = regular4
new_list.at['Thursday', "One Time Users"] = one_time_user4
new_list.at["Friday", "Subscribers"] = regular5
new_list.at['Friday', "One Time Users"] = one_time_user5
new_list.at["Saturday", "Subscribers"] = regular6
new_list.at['Saturday', "One Time Users"] = one_time_user6
print(new_list)
But as you can see, it is a not an efficient one, I have been trying on this for days, so it will not be easy for me to list (or even remember) what I tried
Any suggestion to improve the code please? I just could not figure out how to make the proper looping
Thanks
CodePudding user response:
The no-loop version, just incase you want it...
- uses pandas' groupby to get the results you want
- uses pandas' pivot to create the layout you want
import pandas as pd
import io
data="""Type,Trip_in_min,Day
One-time user,25,Sunday
Subscriber,11,Sunday
Subscriber,4,Sunday
One-time user,11,Sunday
Subscriber,9,Sunday
One-time user,4,Sunday
Subscriber,2,Sunday
One-time user,3,Sunday
Subscriber,2,Sunday
Subscriber,8,Sunday
One-time user,12,Monday
Subscriber,11,Monday
One-time user,2,Monday
Subscriber,2,Monday
One-time user,5,Tuesday
Subscriber,9,Tuesday
One-time user,6,Tuesday
Subscriber,16,Tuesday"""
# Using StringIO as a pretend file...
df = pd.read_csv(io.StringIO(data))
print()
print(df)
# count the number of rows per Day/Type
df_agg = df.groupby(['Day','Type'],as_index=False).size()
print()
print(df_agg)
# Pivot the tyoe to be columns instead
df_pivot = df_agg.pivot(index='Day', columns='Type')
print()
print(df_pivot)