Home > Net >  How do I create combinations from CSV in python?
How do I create combinations from CSV in python?

Time:02-22

I have a python script that imports a CSV file to a dictionary like so:

import csv

with open('staff.csv') as csv_file:
  csv_reader = csv.DictReader(csv_file)

The staff.csv file has about 50 staff members and is formatted like this:

name,department,block
alex,accounting,1
ian,infotech,2
seth,security,2
rachel,research,3
miranda,manufacturing,3

I need to output all possible combinations of people listed in the file according to the following criteria:

  • Each combination contains five people.
  • Each person is in a different department.
  • Each person is also in a different block.

The total output of all combinations should be written to a new file, such as staff-sorted.csv. The format and sorting aren't important but might look like this:

alex,accounting,1
bart,infotech,4
stacy,security,5
rachel,research,3
manny,manufacturing,2

hilda,infotech,5
nancy,accounting,4
manny,manufacturing,2
rachel,research,3
doug,security,1

How do I take these factors into account? I have no programming experience. I think it might work like this:

  1. Add the first person's name to a "combination" array.
  2. Add that department and block to an "occupied" array.
  3. Go down the list to find the first person with an unoccupied department and block.
  4. Add this person's name to the "combination" array and their department and block to the "occupied" array.
  5. Repeat three more times to fill out the names array.

But I don't know how to do that, or how to make it go through to find other possible combinations...

CodePudding user response:

Here is code to iteratively do what you have asked. It uses a depth-first search (DFS) approach to traverse the input data and compile a list of all possible five-person groups meeting the selection criteria.

Someone with no programming experience would be hard-pressed to come up with this on their own. Alternative implementations using recursion instead of an explicit stack variable may also be out of reach for a beginning programmer. Nevertheless, hopefully this answer gives you a sense of how to approach problems like this.

        import collections
        # Assume the csv input data can be put into the following form:
        csvRows = [
            ('name','department','block'),
            ('alex','accounting','1'),
            ('ian','infotech','2'),
            ('seth','security','2'),
            ('rachel','research','3'),
            ('randy','research','3'),
            ('bart','infotech','4'),
            ('stacy','security','5'),
            ('manny','manufacturing','2'),
            ('hilda','infotech','5'),
            ('nancy','accounting','4'),
            ('doug','security','1')
        ]
        records = csvRows[1:]
        persons = {person:(dept, block) for person, dept, block in records}
        depts = defaultdict(lambda: defaultdict(list))
        for person, dept, block in records:
            depts[dept][block]  = [person]
        csvResult = []
        groupsOf5 = set()
        groupOf5 = []
        deptsInGroup, blocksInGroup = set(), set()
        for person in persons:
            stack = [(person, False)]
            while stack:
                person, backtrack = stack.pop()
                curDept, curBlock = persons[person]
                if backtrack:
                    deptsInGroup.remove(curDept)
                    blocksInGroup.remove(curBlock)
                    groupOf5.pop()
                elif len(groupOf5)   1 == 5:
                    groupOf5.append(person)
                    groupsOf5.add(tuple(sorted(groupOf5)))
                    groupOf5.pop()
                else:
                    groupOf5.append(person)
                    deptsInGroup.add(curDept)
                    blocksInGroup.add(curBlock)
                    toAdd = [(person, True)]
                    for dept, blocksInDept in depts.items():
                        if dept in deptsInGroup:
                            continue
                        for block, personsInBlock in blocksInDept.items():
                            if block in blocksInGroup:
                                continue
                            toAdd  = zip(personsInBlock, cycle([False]))
                    stack  = toAdd
        for group in groupsOf5:
            for person in group:
                csvResult.append((person, *persons[person]))
            csvResult.append(tuple()) #separator row
            
        [print(row) for row in csvResult]

The sample output looks like this:

('doug', 'security', '1')
('hilda', 'infotech', '5')
('manny', 'manufacturing', '2')
('nancy', 'accounting', '4')
('randy', 'research', '3')
()
('alex', 'accounting', '1')
('bart', 'infotech', '4')
('manny', 'manufacturing', '2')
('rachel', 'research', '3')
('stacy', 'security', '5')
()
('doug', 'security', '1')
('hilda', 'infotech', '5')
('manny', 'manufacturing', '2')
('nancy', 'accounting', '4')
('rachel', 'research', '3')
()
('alex', 'accounting', '1')
('bart', 'infotech', '4')
('manny', 'manufacturing', '2')
('randy', 'research', '3')
('stacy', 'security', '5')
()

Note that this answer does not focus on the details of getting the data in and out of csv files.

  • Related