I have two CSV files, table1 and table2, and I am trying to find a way to update the Price field in table1 with price values from table2 if-and-only-if table1 and table2 share the same value in the Name field.
My two tables look like this:
Table 1:
Price Name ATT1 ATT2
0 12 APPL 69 81
1 900 GOOGL 303 392
2 32 INV 39 9
Table 2:
Price Name
0 1231 APPL
1 39 FACEBOOK
Successfully performing this join/merge/update would see:
- APPL's price changed to
1231
(from table2) - GOOGL and INV unchanged (no matching names from table2)
- FACEBOOK left out (no matching name from table1)
Expected Final Table:
Name ATT1 ATT2 Price
0 APPL 69 81 1231.0
1 GOOGL 303 392 900
2 INV 39 9 32
Current code:
table1 = pd.read_csv('path/table1.csv', index_col=0)
table2 = pd.read_csv('path/table2.csv', index_col=0)
print(table1)
print(table2)
new_table = table1[["Name ", "ATT1", "ATT2"]].merge(table2[["Price", "Name "]], on="Name ", how="left")
print(new_table)
leads to the following, GOOGL and INV being nullified:
Current Final Table:
Name ATT1 ATT2 Price
0 APPL 69 81 1231.0
1 GOOGL 303 392 NaN
2 INV 39 9 NaN
CodePudding user response:
Please give sample data, I had to hard code it and it was a pain, but I believe this is what you want.
import numpy as np
import pandas as pd
table1 = [
['Price', 'Name', 'ATT1', 'ATT2'],
[12, 'APPL', 69, 81],
[900, 'GOOGL', 303, 392],
[32, 'INV', 39, 9]
]
table2 = [
['Price', 'Name'],
[1231, 'APPL'],
[39, 'FACEBOOK']
]
column_names_1 = table1.pop(0)
column_names_2 = table2.pop(0)
df1 = pd.DataFrame(table1, columns=column_names_1)
df2 = pd.DataFrame(table2, columns=column_names_2)
merged_df = df1.merge(df2, how='left', on='Name')
merged_df['Price'] = merged_df['Price_y'].fillna(merged_df['Price_x']).astype(int)
final_df = merged_df.drop(['Price_x', 'Price_y'], axis=1)
final_df
Output:
Name ATT1 ATT2 Price
0 APPL 69 81 1231
1 GOOGL 303 392 900
2 INV 39 9 32
CodePudding user response:
Here's how to do this without Pandas:
table1.csv
Price,Name,ATT1,ATT2
12,APPL,69,81
900,GOOGL,303,392
32,INV,39,9
table2.csv
Price,Name
1231,APPL
39,FACEBOOK
My code:
import csv
# Start w/T2, since T1 is the basis for FINAL
# Create a small dict of {t2_name: t2_price, ...}
t2_data = {}
with open('table2.csv', newline='') as f:
reader = csv.reader(f)
next(reader) # discard header
for row in reader:
t2_price = row[0]
t2_name = row[1]
t2_data[t2_name] = t2_price
# Build FINAL by finding matching names between T1 and T2, and then overwriting price
final_rows = []
with open('table1.csv', newline='') as f:
reader = csv.reader(f)
final_rows.append(next(reader)) # set final header
for row in reader:
t1_price = row[0]
t1_name = row[1]
if t1_name in t2_data: # find shared names
t2_price = t2_data[t1_name]
t1_price = t2_price # overwrite
row[0] = t1_price # ...
final_rows.append(row)
# Write FINAL
with open('final.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(final_rows)
gives me this final.csv:
Price,Name,ATT1,ATT2
1231,APPL,69,81
900,GOOGL,303,392
32,INV,39,9