I have a small project with MapReduce and since I am new with this I am running into a lot of difficulties so would appreciate the help. In this project, I have a file that contains the nation, year, and weight. I want to find for each nation's year follows the weight. This is my data
USA, 2019; 0.7
USA, 2020; 0.3
USA, 2021; 0.9
Canada, 2019; 0.6
Canada, 2020; 0.3
the mapper
def idf_country(self, key, values):
nation, year = key[0], key[1]
weight = values
yield nation, (year, weight)
This is what I am trying to get
USA 2019, 0.7; 2020, 0.3; 2021, 0.9
Canada 2019, 0.6; 2020, 0.3
CodePudding user response:
Your mapper reads each line of the file. You need to split the line, not use the key
def idf_country(self, key, line):
nation, data = line.split(', ')
yield nation, data
Then the reducer will already be grouped by the nation, so you can just rejoin the values
def reducer(self, nation, values):
yield nation, ', '.join(values)