How do I achieve this in Python. I know there is a vlookup function in excel but if there is a way in Python, I prefer to do it in Python. Basically my goal is to get data from CSV2 column Quantity and write the data to column Quantity of CSV1 based on Bin_Name. The script should not copy all the value at once, it must be by selecting a Bin_Name. Ex: For today, I would like to get the data from Bin_Name ABCDE of CSV2 to CSV1 then it will write the data in column Quantity of CSV1. If this is possible, I will be very grateful and will learn a lot from this. Thank you very much in advance.
CSV1 CSV2
Bin_Name Quantity Bin_Name Quantity
A A 43
B B 32
C C 28
D D 33
E E 37
F F 38
G G 39
H H 41
CodePudding user response:
Here is one way to achieve this in Python without using
- Read the two CSV files into two separate lists of dictionaries, where each dictionary represents a row in the CSV file.
- Iterate over the list of dictionaries from CSV1, and for each dictionary, search for a matching Bin_Name in the list of dictionaries from CSV2.
- If a match is found, update the Quantity value in the dictionary from CSV1 with the Quantity value from the matching dictionary in CSV2.
- Write the updated list of dictionaries from CSV1 back to a new CSV file.
Here is an example implementation of the above steps:
# Import the csv module to read and write CSV files
import csv
# Open the two CSV files in read mode
with open("CSV1.csv", "r") as csv1_file, open("CSV2.csv", "r") as csv2_file:
# Use the csv reader to read the CSV files into lists of dictionaries
csv1_reader = csv.DictReader(csv1_file)
csv1_data = list(csv1_reader)
csv2_reader = csv.DictReader(csv2_file)
csv2_data = list(csv2_reader)
# Iterate over the list of dictionaries from CSV1
for row in csv1_data:
# Search for a matching Bin_Name in the list of dictionaries from CSV2
match = next((r for r in csv2_data if r["Bin_Name"] == row["Bin_Name"]), None)
# If a match is found, update the Quantity value in the dictionary from CSV1
# with the Quantity value from the matching dictionary in CSV2
if match:
row["Quantity"] = match["Quantity"]
# Open a new CSV file in write mode
with open("updated_csv1.csv", "w") as updated_csv1_file:
# Use the csv writer to write the updated list of dictionaries to the new CSV file
csv1_writer = csv.DictWriter(updated_csv1_file, fieldnames=csv1_reader.fieldnames)
csv1_writer.writeheader()
csv1_writer.writerows(csv1_data)
CodePudding user response:
I would simply use pandas built-in functions in this case and there is no need for loops.
So, assuming that there is no duplicate bin names, try the code below to copy the whole column :
df1= pd.read_csv("file1.csv")
df2= pd.read_csv("file2.csv")
df1["Quantity"]= df2["Quantity"].where(df1["Bin_Name"].eq(df2["Bin_Name"]))
print(df1)
Bin_Name Quantity
0 A 43
1 B 32
2 C 28
3 D 33
4 E 37
5 F 38
6 G 39
7 H 41
If you need to copy only a subset of rows, use boolean indexing with pandas.DataFrame.loc
:
vals= ["A", "B", "C", "D"]
df1.loc[df1["Bin_Name"].isin(vals), "Quantity"] = df2.loc[df1["Bin_Name"].isin(vals), "Quantity"]
print(df1)
Bin_Name Quantity
0 A 43.0
1 B 32.0
2 C 28.0
3 D 33.0
4 E NaN
5 F NaN
6 G NaN
7 H NaN
CodePudding user response:
Hi you can simply iterate CSV2 first, then after gathering wanted value, you can search it in CSV1. I wrote a code below it might help you, but there can be much more efficient ways to do.
def func(wanted_rows: list,csv2df: pd.DataFrame):
'''
--Explanations here
'''
# Iterate csv2df
for index,row in csv2df.iterrows():
# Check if index in the wanted list
if index in wanted_rows:
# Get index of CSV1 for same value
csv1_index = CSV1[CSV1.Bin_Name == row['Bin_Name']].index[0]
CSV1.at[csv1_index,'Quantity'] = row['Quantity']
return df
wanted_list = [1,2,3,4,5]
func(wanted_list,CSV2df)
CodePudding user response:
I am not really sure if I understood your question fully, but let me know if this answers your challenge.
The normally way of doing Excel-type operations in Python is by using the framework Pandas. Using this, you can read, manipulate and save your CSV-files (and many other formats) using Python code.
Setting up the example
EDIT: Ensure you have installed pandas by e.g. typing the following in your terminal: pip install pandas
Since I don't have your CSV-files, I will create them using Pandas, rather than using the built-in read_csv()-method.
import pandas as pd
csv1 = pd.DataFrame.from_dict({
"Bin_Name": ["A","B","C","D","E","F","G","H"],
"Quantity": []
}, orient="index").T
csv2 = pd.DataFrame.from_dict({
"Bin_Name": ["A","B","C","D","E","F","G","H"],
"Quantity": [43, 32, 28, 33, 37, 38, 39, 41]
}, orient="index").T
The way I understood your question, you want to specify which bins should be copied from your csv1
-file to your csv2
-file. In your example, you mention something like this:
# Specify bins you want to copy
bins_to_copy = ["A", "B", "C", "D", "E"]
Now, there are several ways of doing the copying-operation you mentioned. Some better than others. Since you explicitly say "the script should not copy all the value at once", I will give one suggestions that follows you instructions, and one that I believe is a better approach.
Solution 1 (bad - using for-loops)
# Loop through each bin and copy cell value from csv2 to csv1
for bin_to_copy in bins_to_copy:
csv1.loc[csv1["Bin_Name"]==bin_to_copy, "Quantity"] = csv2.loc[csv2["Bin_Name"]==bin_to_copy, "Quantity"]
# OUTPUT:
> csv1
Bin_Name Quantity
0 A 43
1 B 32
2 C 28
3 D 33
4 E 37
5 F None
6 G None
7 H None
This approach does exactly what I believe you are asking for. However, there are several weaknesses with it:
- Looping through rows is a very slow approach compared to using more efficient, built-in methods provided in the Pandas-library
- The approach is vulnerable to situations where you have duplicate bins in either of the CSV-files
- The approach is vulnerable to situations where a bin only exists in one of the CSV-files
- Since we have updated one cell at a time, Pandas doesn't understand that the datatype of the column has changed, and we are still left with
None
for the missing values (and an "object"-type for the column) rather thanNaN
(which would indicate a numeric (float) column datatype).
If I have understood your problem correctly, then a better approach would be as follows
Solution 2 (better - using merge)
# Select the columns with bins from csv1
csv1_bins = csv1["Bin_Name"]
# Select only the rows with the desired bins from csv2
csv2_desired_bins = csv2[csv2["Bin_Name"].isin(bins_to_copy)]
# Merge the columns (just "Quantity" in this case) from csv2 to csv1 using "Bin_Name" as "merging-key"
result = pd.merge(left=csv1_bins, right=csv2_desired_bins, on="Bin_Name", how="left")
# OUTPUT
> result
Bin_Name Quantity
0 A 43
1 B 32
2 C 28
3 D 33
4 E 37
5 F NaN
6 G NaN
7 H NaN
The merge()-method is much more powerful and answers all the challenges I listed solution 1. It is also a more generic version of the join()-method, which according to the documentation is "like an Excel VLOOKUP operation." (which is what you mention would be you Excel equivalent)