Home > OS >  Finding string contained in CSV file and computing a sum
Finding string contained in CSV file and computing a sum

Time:09-30

It's my first time working with Panda, so I am trying to wrap my head around all of its functionalities.

Essentially, I want to download my bank statements in CSV and search for a keyword (e.g. steam) and compute the money I spent.

I was able to use panda to locate lines that contain my keyword, but I do not know how to iterate through them and attribute the cost of that purchase to a variable that I will sum up as the iteration grows.

If you look in the image I upload, I am able to find the lines containing my keyword in the dataframe, but what I want to do is for each line found, I want to take the content of the col1 and sum it up together.

Attempt At Code

# importing pandas module
import pandas as pd

keyword = input("Enter the keyword you wish to search in the statement: ")


# reading csv file from url
df = pd.read_csv('accountactivity.csv',header=None)

dff=df.loc[df[1].str.contains(keyword,case=False)]

value=df.values[68][2] #Fetches value of a specific cell in the CSV/dataframe created

print(dff)

print(value)

EDIT: I essentially was almost able to complete the code I wanted, using only the CSV reader, but I can't get that code to find substrings. It only works if I enter the exact same string, meaning if I enter netflix it doesn't work, I would need to write it exactly as it appears on the statement like NETFLIX.COM _V. Here is another screenshot of that working code. I essentially want to mimic that with the capabilities of just finding substrings.

Working Code using CSV reader

import csv

data=[]

with open("accountactivity.csv") as csvfile:
    reader = csv.reader(csvfile)
    for row in reader:
        data.append(row)


keyword = input("Enter the keyword you wish to search in the statement: ")

col = [x[1] for x in data]

Sum = 0

if keyword in col:
    for x in range(0, len(data)):
        if keyword == data[x][1]:
            PartialSum=float(data[x][2])
            Sum=Sum PartialSum
            print(data[x][1])

    print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')

else:
    print("Keyword returned no results.")

The format of the CSV is the following: CSV Format

column 0 Date of transaction

column 1 Name of transaction

column 2 Money spent from account

column 3 Money received to account

The CSV file downloaded directly from my bank has no headers. So I refer to columns using col[0] etc...

Thanks for your help, I will continue meanwhile to look at how to potentially do this.

CodePudding user response:

dff[dff.columns[col_index]].sum()

where col_index is the index of the column you want to sum together.

CodePudding user response:

Thanks everyone for your help. I ended up understanding more how dataframe with Pandas work and I used the command: df[df.columns["col_index"]].sum() (which was suggested to me by Jonny Kong) with the column of interest (which in my case is column 2 containing my expenses). It computes the sum of my expenses for the searched keyword which is what I need!

#Importing pandas module
import pandas as pd

#Keyword searched through bank statement
keyword = input("Enter the keyword you wish to search in the statement: ")


#Reading the bank statement CSV file
df = pd.read_csv('accountactivity.csv',header=None)

#Creating dataframe from bank statement with lines that match search keyword
dff=df.loc[df[1].str.contains(keyword,case=False)]

#Sum the column which contains total money spent on the keyword searched
Sum=dff[dff.columns[2]].sum()

#Prints the created dataframe
print("\n",dff,"\n")

#Prints the sum of expenses for the keyword searched
print("The sum for expenses at ",keyword," is of: ",Sum,"$",sep = '')

Working Code!

Again, thanks everyone for helping and supporting me through my first post on SO!

  • Related