Home > Net >  Cleaning .csv text data in Python
Cleaning .csv text data in Python

Time:07-30

I have recently created a python program that would import my finances from a .csv file and transfer it onto google sheets. However, I am struggling to figure out how to fix the names that my bank gives me.

Example: ME DC SI XXXXXXXXXXXXXXXX NETFLIX should just be NETFLIX, POS XXXXXXXXXXXXXXXX STEAM PURCHASE should just be STEAM and so on

Forgive me if this is a stupid question as I am a newbie when it comes to coding and I am just looking to use it to automate certain situations in my life.

import csv
from unicodedata import category
import gspread
import time

MONTH = 'June'
# Set month name

file = f'HDFC_{MONTH}_2022.csv'
#the file we need to extract data from

transactions = []
# Create empty list to add data to

def hdfcFin(file):
    '''Create a function that allows us to export data to google sheets'''
    with open(file, mode = 'r') as csv_file:
        csv_reader = csv.reader(csv_file)
        for row in csv_reader:
            date = row[0]
            name = row[1]
            expense = float(row[2])
            income = float(row[3])
            category = 'other'
            transaction = ((date, name, expense, income, category))
            transactions.append(transaction)
        return transactions
        
sa = gspread.service_account()
# connect json to api
sh = sa.open('Personal Finances')

wks = sh.worksheet(f'{MONTH}')

rows = hdfcFin(file)

for row in rows:
    wks.insert_row([row[0], row[1], row[4], row[2], row[3]], 8)
    time.sleep(2)
    # time delay because of api restrictions

CodePudding user response:

If you dont have specific format to identify the name then you can use below logic. Which will have key value pair. If key appears in name then you can replace it with value.

d={'ME DC SI XXXXXXXXXXXXXXXX NETFLIX':'NETFLIX','POS XXXXXXXXXXXXXXXX STEAM PURCHASE':'STEAM'}
test='POS XXXXXXXXXXXXXXXX STEAM PURCHASE'
if test in d.keys():
    test=d[test]
print(test)

Output:

STEAM

If requirement is to fetch only last word out of your name then you can use below logic.

test='ME DC SI XXXXXXXXXXXXXXXX NETFLIX'
test=test.split(" ")[-1]
print(test)

Output:

NETFLIX
  • Related