Home > Back-end >  Output results as "--" in python - NumPy
Output results as "--" in python - NumPy

Time:04-15

I have 2 csv's that I am joining using a specific key which is cod_enti.

data.csv:

cod_pers,cod_enti,fec_venc
2317422,208,12/04/2022
2086638,212,31/03/2022
2392115,210,02/04/2022
2086638,212,13/03/2022

entid.csv

cod_enti,cod_mercado
208,40
209,50
210,16
211,40
212,50

My code:

import csv
import numpy as np
from numpy.lib import recfunctions
from time import strftime
from datetime import datetime, date, time, timedelta
from dateutil.relativedelta import relativedelta

#Read the CSV file
str2date = lambda x: datetime.strptime(x, '%d/%m/%Y')
data_datos = np.genfromtxt(r'data.csv', delimiter=',', dtype=None, names=True, converters={'fec_venc':str2date}, encoding="UTF-8")
data_enti = np.genfromtxt(r'entid.csv', delimiter=',', dtype=None, names=True, encoding="UTF-8")

merged_data = recfunctions.join_by('cod_enti', data_datos, data_enti )

print(merged_data)

Which gives me as a result:

[(208, 2317422, datetime.datetime(2022, 4, 12, 0, 0), 40) 
 (210, 2392115, datetime.datetime(2022, 4, 2, 0, 0), 16)  
 (212, 2086638, datetime.datetime(2022, 3, 13, 0, 0), --) 
 (212, 2086638, datetime.datetime(2022, 3, 31, 0, 0), 50)]

My problem is that it doesn't work for me to have the result appear in the penultimate row -- when it should be 50. Does anyone know what is causing this problem and how I could solve it?

Thank you very much for your help!! :D

CodePudding user response:

The documentation says, quote, "Neither r1 nor r2 should have any duplicates along key: the presence of duplicates will make the output quite unreliable. Note that duplicates are not looked for by the algorithm.".

http://pyopengl.sourceforge.net/pydoc/numpy.lib.recfunctions.html

Pandas has a more traditional join feature, if you want to go that far.

  • Related