Home > Enterprise >  How to update field in CSV_1 with values from CSV_2?
How to update field in CSV_1 with values from CSV_2?

Time:11-13

I have two CSV files, table1 and table2, and I am trying to find a way to update the Price field in table1 with price values from table2 if-and-only-if table1 and table2 share the same value in the Name field.

My two tables look like this:

Table 1:

   Price  Name   ATT1  ATT2
0     12   APPL    69    81
1    900  GOOGL   303   392
2     32    INV    39     9

Table 2:

   Price     Name 
0   1231      APPL
1     39  FACEBOOK

Successfully performing this join/merge/update would see:

  • APPL's price changed to 1231 (from table2)
  • GOOGL and INV unchanged (no matching names from table2)
  • FACEBOOK left out (no matching name from table1)
Expected Final Table:

   Name   ATT1  ATT2   Price
0   APPL    69    81  1231.0
1  GOOGL   303   392     900
2    INV    39     9     32

Current code:

table1 = pd.read_csv('path/table1.csv', index_col=0)
table2 = pd.read_csv('path/table2.csv', index_col=0)
print(table1)
print(table2)

new_table = table1[["Name ", "ATT1", "ATT2"]].merge(table2[["Price", "Name "]], on="Name ", how="left")
print(new_table)

leads to the following, GOOGL and INV being nullified:

Current Final Table:

   Name   ATT1  ATT2   Price
0   APPL    69    81  1231.0
1  GOOGL   303   392     NaN
2    INV    39     9     NaN

CodePudding user response:

Please give sample data, I had to hard code it and it was a pain, but I believe this is what you want.

import numpy as np
import pandas as pd

table1 = [
    ['Price', 'Name', 'ATT1', 'ATT2'],
    [12, 'APPL', 69, 81],
    [900, 'GOOGL', 303, 392],
    [32, 'INV', 39, 9]
]

table2 = [
    ['Price', 'Name'],
    [1231, 'APPL'],
    [39, 'FACEBOOK']
]

column_names_1 = table1.pop(0)
column_names_2 = table2.pop(0)

df1 = pd.DataFrame(table1, columns=column_names_1)
df2 = pd.DataFrame(table2, columns=column_names_2)

merged_df = df1.merge(df2, how='left', on='Name')
merged_df['Price'] = merged_df['Price_y'].fillna(merged_df['Price_x']).astype(int)
final_df = merged_df.drop(['Price_x', 'Price_y'], axis=1)
final_df

Output:

    Name    ATT1    ATT2    Price
0   APPL    69      81      1231
1   GOOGL   303     392     900
2   INV     39      9       32

CodePudding user response:

Here's how to do this without Pandas:

table1.csv

Price,Name,ATT1,ATT2
12,APPL,69,81
900,GOOGL,303,392
32,INV,39,9

table2.csv

Price,Name
1231,APPL
39,FACEBOOK

My code:

import csv

# Start w/T2, since T1 is the basis for FINAL

# Create a small dict of {t2_name: t2_price, ...}
t2_data = {}

with open('table2.csv', newline='') as f:
    reader = csv.reader(f)
    next(reader)  # discard header

    for row in reader:
        t2_price = row[0]
        t2_name = row[1]
        t2_data[t2_name] = t2_price

# Build FINAL by finding matching names between T1 and T2, and then overwriting price

final_rows = []

with open('table1.csv', newline='') as f:
    reader = csv.reader(f)
    final_rows.append(next(reader))    # set final header

    for row in reader:
        t1_price = row[0]
        t1_name  = row[1]
        if t1_name in t2_data:         # find shared names
            t2_price = t2_data[t1_name]
            t1_price = t2_price        # overwrite
            row[0]   = t1_price        # ...

        final_rows.append(row)

# Write FINAL

with open('final.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerows(final_rows)

gives me this final.csv:

Price,Name,ATT1,ATT2
1231,APPL,69,81
900,GOOGL,303,392
32,INV,39,9
  • Related