I have 2 csv's that I am joining using a specific key which is cod_enti
.
data.csv:
cod_pers,cod_enti,fec_venc
2317422,208,12/04/2022
2086638,212,31/03/2022
2392115,210,02/04/2022
2086638,212,13/03/2022
entid.csv
cod_enti,cod_mercado
208,40
209,50
210,16
211,40
212,50
My code:
import csv
import numpy as np
from numpy.lib import recfunctions
from time import strftime
from datetime import datetime, date, time, timedelta
from dateutil.relativedelta import relativedelta
#Read the CSV file
str2date = lambda x: datetime.strptime(x, '%d/%m/%Y')
data_datos = np.genfromtxt(r'data.csv', delimiter=',', dtype=None, names=True, converters={'fec_venc':str2date}, encoding="UTF-8")
data_enti = np.genfromtxt(r'entid.csv', delimiter=',', dtype=None, names=True, encoding="UTF-8")
merged_data = recfunctions.join_by('cod_enti', data_datos, data_enti )
print(merged_data)
Which gives me as a result:
[(208, 2317422, datetime.datetime(2022, 4, 12, 0, 0), 40)
(210, 2392115, datetime.datetime(2022, 4, 2, 0, 0), 16)
(212, 2086638, datetime.datetime(2022, 3, 13, 0, 0), --)
(212, 2086638, datetime.datetime(2022, 3, 31, 0, 0), 50)]
My problem is that it doesn't work for me to have the result appear in the penultimate row --
when it should be 50
. Does anyone know what is causing this problem and how I could solve it?
Thank you very much for your help!! :D
CodePudding user response:
The documentation says, quote, "Neither r1
nor r2
should have any duplicates along key
: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm.".
http://pyopengl.sourceforge.net/pydoc/numpy.lib.recfunctions.html
Pandas has a more traditional join feature, if you want to go that far.