Home > Blockchain >  Conditionally subtracting Pandas Dataframe rows based on row names
Conditionally subtracting Pandas Dataframe rows based on row names

Time:10-30

I am working with a large dataset, but the problem can be summarized using the smaller dataset below:

import pandas as pd
df = pd.DataFrame({"Filename":["fileName1_uniqueTag1", "fileName2_uniqueTag1", "fileName3_uniqueTag1", "fileName1_uniqueTag2", "fileName2_uniqueTag2", "fileName3_uniqueTag2"], 
                   "measurement":[1336.564888, 1090.852579, 990.320323, 1202.522612, 1098.045258, 923.600277],})
print(df)
>>>
               Filename  measurement
0  fileName1_uniqueTag1  1336.564888
1  fileName2_uniqueTag1  1090.852579
2  fileName3_uniqueTag1   990.320323
3  fileName1_uniqueTag2  1202.522612
4  fileName2_uniqueTag2  1098.045258
5  fileName3_uniqueTag2   923.600277

There are three different file names with two unique tags for each file name in the "Filename" column. The goal is to calculate the ratio of the measurements for uniqueTag1/uniqueTag2 for each file. The result should look something like this:

    Filename  uniqueTag2/uniqueTag1
0  fileName1               0.899711
1  fileName2               1.006593
2  fileName3               0.932627

I can make lists of the three different file names and two different tags using:

nameList = df["Filename"].tolist()
fileNames = []                              #empty list to fill with different base file names
uniqueTags = []                             #empty list to fill with unique tags
for name in nameList:                       #iterate through list of full file names
    subStrings = name.split("_")            #splits each base file name at the underscore
    if subStrings[0] not in fileNames:      #if the base file name isn't already in the file names list...
        fileNames.append(subStrings[0])     #append it
    if subStrings[1] not in uniqueTags:     #if the unique tag isn't already in the unique tags list...
        uniqueTags.append(subStrings[1])    #append it

I think that I could access individual measurement values by making the Filenames into indexes and using df.at(), but this seems horribly messy and I'm certain there must be a much better way to do this using functionality within Pandas. Any suggestions?

CodePudding user response:

You can do something like this, which is pretty straightforward, using str.split():

df[['Filename','uniquetag']] = df['Filename'].str.split('_', expand=True)
tag1 = df.loc[df['uniquetag'] == 'uniqueTag1'].set_index('Filename')['measurement']
tag2 = df.loc[df['uniquetag'] == 'uniqueTag2'].set_index('Filename')['measurement']
tag2 / tag1

CodePudding user response:

Try

df[['one','two']] = df.filename.str.split("_",expand=True)

Then groupby using those 2 columns

  • Related