I have two CSV files, CSV_A and CSV_B.csv. I must insert the column (Category) from CSV_B into CSV_A.
The two CSVs share a common column: StockID, and I must add the correct category onto each row by matching the StockID columns.
This can be done using merge, like this:
dfa.merge(dfb, how='left', on='StockID')
but I only want to add the one column, not join the two dataframes.
CSV_A (indexed on StockID):
StockID,Brand,ToolName,Price
ABC123,Maxwell,ToolA,1.25
BCD234,Charton,ToolB,2.22
CDE345,Bingley,ToolC,3.33
DEF789,Charton,ToolD,1.44
CSV_B:
PurchDate,Supplier,StockID,Category
20201005,Sigmat,BCD234,CatShop
20210219,Zorbak,AAA111,CatWares
20210307,Phillips
20210417,Tandey,CDE345,CatThings
20210422,Stapek,BBB222,CatElse
20210502,Zorbak,ABC123,CatThis
20210512,Zorbak,CCC999,CatThings
20210717,Phillips,DEF789,CatShop
My task is to insert a Cat
field into CSV_A, matching each inserted Category with its correct StockID.
Note1: CSV_A is indexed on the StockID column. CSV_B has the default indexing.
Note2: There are some columns in CSV_B (e.g. row 3) that do not have complete information.
Note3: Adding column "Category" from CSV_B into CSV_A but calling it "Cat" in CSV_A
CodePudding user response:
Use Series.map
to map 'Category'
based on 'StockID'
df_a['Cat'] = df_a['StockID'].map(dict(zip(df_b['StockID'], df_b['Category'])))
Note that for this specific question (i.e. with CSV_A indexed on StockID), the code must be:
df_a['Cat'] = df_a.index.map(dict(zip(df_b['StockID'], df_b['Category'])))
^^^^^
CodePudding user response:
While creating the question I discovered the solution, so decided to post it rather than just deleting the question.
import pandas as pd
dfa = pd.read_csv('csv_a.csv')
dfa.set_index('StockID', inplace=True)
dfb = pd.read_csv('csv_b.csv')
#remove incomplete rows (i.e. without Category/StockID columns)
dfb_tmp = dfb[dfb['StockID'].notnull()]
def myfunc(row):
# NB: Use row.name because row['StockID'] is the index
if row.name in list(dfb_tmp['StockID']):
return dfb_tmp.loc[dfb_tmp['StockID'] == row.name]['Category'].values[0]
dfa['Cat'] = dfa.apply(lambda row: myfunc(row), axis=1)
print(dfa)
Result:
StockID Brand ToolName Price Cat
ABC123 Maxwell ToolA 1.25 CatThis
BCD234 Charton ToolB 2.22 CatShop
CDE345 Bingley ToolC 3.33 CatThings
DEF789 Charton ToolD 1.44 CatShop