Home > Software engineering >  python with hadoop project: how to build a reducer to concatenate pairs of values
python with hadoop project: how to build a reducer to concatenate pairs of values

Time:08-17

I have a small project with MapReduce and since I am new with this I am running into a lot of difficulties so would appreciate the help. In this project, I have a file that contains the nation, year, and weight. I want to find for each nation's year follows the weight. This is my data

USA, 2019; 0.7
USA, 2020; 0.3
USA, 2021; 0.9
Canada, 2019; 0.6
Canada, 2020; 0.3

the mapper

def idf_country(self, key, values):
  nation, year = key[0], key[1]
  weight = values
  yield nation, (year, weight)

This is what I am trying to get

USA 2019, 0.7; 2020, 0.3; 2021, 0.9
Canada  2019, 0.6; 2020, 0.3

CodePudding user response:

Your mapper reads each line of the file. You need to split the line, not use the key

def idf_country(self, key, line):
  nation, data = line.split(', ')
  yield nation, data

Then the reducer will already be grouped by the nation, so you can just rejoin the values

def reducer(self, nation, values):
  yield nation, ', '.join(values) 
  • Related