Home > database >  Finding the difference between values with the same name in a merged CSV file
Finding the difference between values with the same name in a merged CSV file

Time:06-23

I need to find the difference between values with the same names. I have two csv files that I merged together and placed in another csv file to have a side by side comparison of the number differences.

Below is the sample merged csv file:

Q1Count             Q1Names               Q2Count                   Q2Names
2                     candy                     2                     candy
9                     apple                     8                     apple
10                    bread                     5                 pineapple
4                      pies                    12                     bread
3                   cookies                     4                      pies
32                chocolate                     3                   cookies
[Total count: 60]                               27                 chocolate
NaN                     NaN                     [Total count: 61]       

All the names are the same (almost), but I would like to have a way to make a new row space for the new name that popped up under Q2Names, pinapple.

Below is the code I implemented so far:

import pandas as pd
import csv
    
Q1ReportsDir='/path/to/Q1/Reports/'
Q2ReportsDir='/path/to/Q2/Reports/'

Q1lineCount = f'{Q1ReportsDir}Q1Report.csv'
Q2lineCount = f'{Q2ReportsDir}Q2Report.csv'

merged_destination = f'{Q2ReportsDir}DifferenceReport.csv'

diffDF = [pd.read_csv(p) for p in (Q1lineCount, Q2lineCount)]
merged_dataframe = pd.concat(diffDF, axis=1)
merged_dataframe.to_csv(merged_destination, index=False)


diffGenDF = pd.read_csv(merged_destination)

# getting Difference
diffGenDF ['Difference'] = diffGenDF ['Q1Count'] - diffGenDF ['Q2Count']
diffGenDF = diffGenDF [['Difference', 'Q1Count', 'Q1Names', 'Q2Count ', 'Q2Names']]

diffGenDF.to_csv(merged_destination, index=False)

So, making a space under Q1Names and adding a 0 under Q1Count in the same row where pineapple is under column Q2Names would make this easier to see an accurate difference between the values.

Q1Count              Q1Names               Q2Count                   Q2Names
2                      candy                     2                     candy
9                      apple                     8                     apple
0                                                5                 pineapple
10                     bread                    12                     bread
4                       pies                     4                      pies
3                    cookies                     3                   cookies
32                 chocolate                    27                 chocolate
[Total count: 60]                               [Total count: 61]

The final desired output I would get if I can get past that part is this:

Difference          Q1Count                  Q1Names          Q2Count                  Q2Names
0                         2                    candy                2                    candy
1                         9                    apple                8                    apple
-5                        0                                         5                pineapple
-2                       10                    bread               12                    bread
0                         4                     pies                4                     pies
0                         3                  cookies                3                  cookies
5                        32                chocolate               27                chocolate
                         [Total count: 60]                         [Total count: 61]

CodePudding user response:

I was able to get your same results using a pd.merge with the dataframe you provided

df_merge = pd.merge(df1, df2, left_on = 'Q1Names', right_on = 'Q2Names', how = 'outer')
df_merge[['Q1Count', 'Q2Count']] = df_merge[['Q1Count', 'Q2Count']].fillna(0)
df_merge[['Q1Names', 'Q2Names']] = df_merge[['Q1Names', 'Q2Names']].fillna('')
df_merge['Difference'] = df_merge['Q1Count'].sub(df_merge['Q2Count'])
  • Related