I have 2 csv files as below. I have to add the Lat and Long features from one csv to another based on the ID
Based on the ID, I have to add the columns Latitude and Longitude to CSV 1 (ID 1 has different values and ID 2 has different values and so on.
Can anyone please tell me how to do this using python?
CodePudding user response:
I recommend you use the pandas
library. If you've not come across it before, it's a very widely-used library for data manipulation that makes this task very easy. However, it's not part of the standard library that comes packaged with Python, so depending on your environment you may need to install it using PyPi. Guides like this might help if you've not done that before.
With pandas installed in your environment, you can run the following code (substitute in your file paths for csv1.csv
and csv2.csv
).
import pandas as pd
# Load the csv files into dataframes and set the index of each to the 'ID' column
df1 = pd.read_csv('csv1.csv').set_index('ID')
df2 = pd.read_csv('csv2.csv').set_index('ID')
# Join using how='outer' to keep all rows from both dataframes
joined = df1.join(df2, how='outer')
print(joined)
# Save to a new csv file
joined.to_csv('joined.csv')
I made some simple sample data to demonstrate the result of running the code:
csv1.csv:
ID,my_feature
1,banana
2,apple
3,pear
csv2.csv:
ID,latitude,longitude
1,7,-4
4,10,15
Printed output:
my_feature latitude longitude
ID
1 banana 7.0 -4.0
2 apple NaN NaN
3 pear NaN NaN
4 NaN 10.0 15.0
Output to joined.csv:
ID,my_feature,latitude,longitude
1,banana,7.0,-4.0
2,apple,,
3,pear,,
4,,10.0,15.0