Home > OS >  How to count number of similar names to edit specific column in csv file using pandas?
How to count number of similar names to edit specific column in csv file using pandas?

Time:04-21

I have a python assignment that i have been working on for a while and this last part has had me stumped as i dont know how to go about it. I am supposed to count the number of inputs a column has in order for the rest of my code not to begin again at 1. Here is what the column looks like:

CSV file column that shows similar name

so i currently have this as my code to search and it works perfectly fine and i have been using the pandas library for this entire project. It will pull all the strings that contain the name i inputed (ex. E_LAPALMA). "Terminal" being the name of that column and YRN being the name of the file.

searchcell = input("What would you like to search? ")
var1 = YRN[YRN['Terminal'].str.contains(searchcell)]

if not var1.empty:
    print("Matches")
    print(var1)
    editline()
else:
    print("No matches")

How can i count the number at the end of the input i type in and then be able to add 1 to that for new cells i have to input? For ex. the last cell containing E_LAPALMA is E_LAPALMA_03, how can i make the code add my new cell and change the name to E_LAPALMA_04?

If possible, is there a way to edit the numbering the existing cells? For ex. i need to change E_LAPALMA_02 to E_LAPALMA_04.

CodePudding user response:

Can the number be part of the search? There are multiple ways to do this but I guess the easiest would be to get all entries that contain the search term, then sort them alphabetically and get the maximum (as strings which are identical except for a number in the end will sort correctly), then extract the number from that entry, increment it by one and put it back into a string format.

searchcell = input("What would you like to search? ")
var1 = YRN[YRN['Terminal'].str.contains(searchcell)]

if not var1.empty:
    print("Matches")
    # get entry with the highest ending number
    max_entry = var1['Terminal'].max()
    # construct new entry by first recreating the part up until the last underscore ...
    new_entry = "_".join(max_entry.split("_")[:-1])   "_"
    # ... and then converting the existing number to int, increase by one, and format as string containing a leading zero if the number is shorter than 2 digits
    new_entry  = f"{(int(max_entry.split('_')[-1])   1):02d}"
else:
    print("No matches")

For then adding that new row to the DataFrame there are multiple options, check out for example this thread: Create a Pandas Dataframe by appending one row at a time

If you want to make sure that the new entry is at the same place the other entries are you can sort the DataFrame alphabetically after each insertion. Seems like your data is already ordered alphabetically so that shouldnt change any order except where the new entry is.

If a number can be part of the search but it should independently still find any potential higher numbers and add a new value correctly you have to modify your searchcell value by removing trailing digits before using it to find all matching entries.

Also as you might guess this can get very fishy with short searches because this assumes there will only be one matching group. If you have a searchterm such as "E" it will just find the highest digit entry in the alphabetically highest matching group containing any "E" characters and add a new entry for that group.

  • Related