Home > OS >  Python Pandas, counting all the occurrences of an item in a CSV and assigning that count to a unique
Python Pandas, counting all the occurrences of an item in a CSV and assigning that count to a unique

Time:09-24

So I have CSV with ~1300 rows. Each row is the time a machine was used For example:

Machine Name,   Timestamp,            Total Revenue
Washer #25,     2021-08-22 06:07:21,   11.00
Dryer #39,      2021-08-22 06:37:41,   1.00

I want to put together a dataset with data from this CSV. I have so far gotten total machines used per hour, the total revenue per hour, and am working on grabbing the amount of times a given machine is used per week. The best I have done with the last part is that I have counted the amount of times a machine is used per week and added it as a column to the end of the dataframe. So every time a machine comes up it shows it total count. Looks like:

Machine Name    Timestamp                Total Revenue    Machine Use Count
Washer #25       2021-08-22 06:07:21     11.00            17
Dryer #39        2021-08-22 06:37:41     1.00             83
Dryer #39        2021-08-25 08:45:15     1.00             83

What I would prefer it to look like, would be the name of the machine, followed by the total count it appears in the week. There are only 39 machines, but they each show up multiple times. I would like it to look like:

Machine Name     Total Revenue    Machine Use Count
Washer #1           100.00           17
Washer #2           55.50            28
...                 ...              ...
Dryer #39           1.00             83

My current code looks like this

import csv
import pandas as pd

csv = 'CSV PATH'

df = pd.read_csv(csv)

df['Timestamp'] = pd.to_datetime(df['Timestamp'])

hourlyMachineUseCount = (df.groupby(df['Timestamp'].dt.floor('h'))['Machine Name'].count()) # Sorts by day, counting amount of machines used per hour

totalHourlyRevenue = (df.groupby(df['Timestamp'].dt.floor('h'))['Total Revenue'].sum()) # Gives back the total revenue per hour per day 

machineWeeklyCount = (df.groupby('Machine Name').count()) # Counts how many times a machine name appears in a week

testVar = df['Machine Name'].unique()

df['Machine Use Count'] = df.groupby('Machine Name')['Machine Name'].transform('count') # counts how many times a machine is used per week, adds to df at end of dataset
print(df)

I have all the pieces I just have not yet figured out how to get the last piece perfect and how to put them all together. I appreciate any and all help!

CodePudding user response:

Try this:

pd.pivot_table(df, index='Machine Name', values=['Total Revenue','Machine Use Count'], aggfunc=max)
  • Related