Lookup of Data from a CSV file in Python-CodePudding

How do I achieve this in Python. I know there is a vlookup function in excel but if there is a way in Python, I prefer to do it in Python. Basically my goal is to get data from CSV2 column Quantity and write the data to column Quantity of CSV1 based on Bin_Name. The script should not copy all the value at once, it must be by selecting a Bin_Name. Ex: For today, I would like to get the data from Bin_Name ABCDE of CSV2 to CSV1 then it will write the data in column Quantity of CSV1. If this is possible, I will be very grateful and will learn a lot from this. Thank you very much in advance.

CSV1                     CSV2

Bin_Name   Quantity      Bin_Name   Quantity
A                        A          43  
B                        B          32
C                        C          28
D                        D          33 
E                        E          37   
F                        F          38
G                        G          39
H                        H          41

CodePudding user response：

Here is one way to achieve this in Python without using

Read the two CSV files into two separate lists of dictionaries, where each dictionary represents a row in the CSV file.
Iterate over the list of dictionaries from CSV1, and for each dictionary, search for a matching Bin_Name in the list of dictionaries from CSV2.
If a match is found, update the Quantity value in the dictionary from CSV1 with the Quantity value from the matching dictionary in CSV2.
Write the updated list of dictionaries from CSV1 back to a new CSV file.

Here is an example implementation of the above steps:

# Import the csv module to read and write CSV files
import csv

# Open the two CSV files in read mode
with open("CSV1.csv", "r") as csv1_file, open("CSV2.csv", "r") as csv2_file:
    # Use the csv reader to read the CSV files into lists of dictionaries
    csv1_reader = csv.DictReader(csv1_file)
    csv1_data = list(csv1_reader)

    csv2_reader = csv.DictReader(csv2_file)
    csv2_data = list(csv2_reader)

    # Iterate over the list of dictionaries from CSV1
    for row in csv1_data:
        # Search for a matching Bin_Name in the list of dictionaries from CSV2
        match = next((r for r in csv2_data if r["Bin_Name"] == row["Bin_Name"]), None)

        # If a match is found, update the Quantity value in the dictionary from CSV1
        # with the Quantity value from the matching dictionary in CSV2
        if match:
            row["Quantity"] = match["Quantity"]

    # Open a new CSV file in write mode
    with open("updated_csv1.csv", "w") as updated_csv1_file:
        # Use the csv writer to write the updated list of dictionaries to the new CSV file
        csv1_writer = csv.DictWriter(updated_csv1_file, fieldnames=csv1_reader.fieldnames)
        csv1_writer.writeheader()
        csv1_writer.writerows(csv1_data)

CodePudding user response：

I would simply use pandas built-in functions in this case and there is no need for loops.

So, assuming that there is no duplicate bin names, try the code below to copy the whole column :

df1= pd.read_csv("file1.csv")
df2= pd.read_csv("file2.csv")

df1["Quantity"]= df2["Quantity"].where(df1["Bin_Name"].eq(df2["Bin_Name"]))

print(df1)

  Bin_Name  Quantity
0        A        43
1        B        32
2        C        28
3        D        33
4        E        37
5        F        38
6        G        39
7        H        41

If you need to copy only a subset of rows, use boolean indexing with pandas.DataFrame.loc :


vals= ["A", "B", "C", "D"]
df1.loc[df1["Bin_Name"].isin(vals), "Quantity"] = df2.loc[df1["Bin_Name"].isin(vals), "Quantity"]
print(df1)

  Bin_Name  Quantity
0        A      43.0
1        B      32.0
2        C      28.0
3        D      33.0
4        E       NaN
5        F       NaN
6        G       NaN
7        H       NaN

CodePudding user response：

Hi you can simply iterate CSV2 first, then after gathering wanted value, you can search it in CSV1. I wrote a code below it might help you, but there can be much more efficient ways to do.

def func(wanted_rows: list,csv2df: pd.DataFrame):
'''
--Explanations here
'''
# Iterate csv2df
for index,row in csv2df.iterrows():
    # Check if index in the wanted list
    if index in wanted_rows:
        # Get index of CSV1 for same value
        csv1_index = CSV1[CSV1.Bin_Name == row['Bin_Name']].index[0]
        CSV1.at[csv1_index,'Quantity'] = row['Quantity']
return df

wanted_list = [1,2,3,4,5]
func(wanted_list,CSV2df)

CodePudding user response：

I am not really sure if I understood your question fully, but let me know if this answers your challenge.

The normally way of doing Excel-type operations in Python is by using the framework Pandas. Using this, you can read, manipulate and save your CSV-files (and many other formats) using Python code.

Setting up the example

EDIT: Ensure you have installed pandas by e.g. typing the following in your terminal: pip install pandas

Since I don't have your CSV-files, I will create them using Pandas, rather than using the built-in read_csv()-method.

import pandas as pd

csv1 = pd.DataFrame.from_dict({
    "Bin_Name": ["A","B","C","D","E","F","G","H"],
    "Quantity": []
}, orient="index").T

csv2 = pd.DataFrame.from_dict({
    "Bin_Name": ["A","B","C","D","E","F","G","H"],
    "Quantity": [43, 32, 28, 33, 37, 38, 39, 41]
}, orient="index").T

The way I understood your question, you want to specify which bins should be copied from your csv1-file to your csv2-file. In your example, you mention something like this:

# Specify bins you want to copy
bins_to_copy = ["A", "B", "C", "D", "E"]

Now, there are several ways of doing the copying-operation you mentioned. Some better than others. Since you explicitly say "the script should not copy all the value at once", I will give one suggestions that follows you instructions, and one that I believe is a better approach.

Solution 1 (bad - using for-loops)

# Loop through each bin and copy cell value from csv2 to csv1
for bin_to_copy in bins_to_copy:
    csv1.loc[csv1["Bin_Name"]==bin_to_copy, "Quantity"] = csv2.loc[csv2["Bin_Name"]==bin_to_copy, "Quantity"]

# OUTPUT:
> csv1
  Bin_Name Quantity
0        A       43
1        B       32
2        C       28
3        D       33
4        E       37
5        F     None
6        G     None
7        H     None

This approach does exactly what I believe you are asking for. However, there are several weaknesses with it:

Looping through rows is a very slow approach compared to using more efficient, built-in methods provided in the Pandas-library
The approach is vulnerable to situations where you have duplicate bins in either of the CSV-files
The approach is vulnerable to situations where a bin only exists in one of the CSV-files
Since we have updated one cell at a time, Pandas doesn't understand that the datatype of the column has changed, and we are still left with None for the missing values (and an "object"-type for the column) rather than NaN (which would indicate a numeric (float) column datatype).

If I have understood your problem correctly, then a better approach would be as follows

Solution 2 (better - using merge)

# Select the columns with bins from csv1
csv1_bins = csv1["Bin_Name"]

# Select only the rows with the desired bins from csv2
csv2_desired_bins = csv2[csv2["Bin_Name"].isin(bins_to_copy)]

# Merge the columns (just "Quantity" in this case) from csv2 to csv1 using "Bin_Name" as "merging-key"
result = pd.merge(left=csv1_bins, right=csv2_desired_bins, on="Bin_Name", how="left")

# OUTPUT
> result
  Bin_Name Quantity
0        A       43
1        B       32
2        C       28
3        D       33
4        E       37
5        F      NaN
6        G      NaN
7        H      NaN

The merge()-method is much more powerful and answers all the challenges I listed solution 1. It is also a more generic version of the join()-method, which according to the documentation is "like an Excel VLOOKUP operation." (which is what you mention would be you Excel equivalent)