I'm a Python beginner. I had inspired by some Python courses. This is the example CSV file below.
Name | Location | Number |
---|---|---|
Andrew Platt Andrew | A B C | 100 |
Steven Thunder Andrew | A B C | 50 |
Jeff England Steven | A B C | 30 |
Andrew England Jeff | A B C | 30 |
I want to get a result like that
['Andrew', 180
'Platt', 100
'Steven', 50
'Jeff', 60
'England', 60
'Andrew Platt', 100
'Platt Andrew', 100
'Steven Thunder', 50
'Thunder Andrew', 50
........]
Logic:
- One-word name, e.g. 'Andrew', as it shows rows 1, 2 and 4, so the result is 180 (100 50 30)
- Two-word name, e.g. 'Andrew Platt', as it shows row 1 only, so the result is 100
- Export result to a new CSV file
import csv
#from itertools import chain
#find one-word
filename=open('sample.csv', 'r')
file = csv.DictReader(filename)
one_word=[]
for col in file:
one_word.append(col['Name'].split()) #find one-word
print(one_word)
#list(chain.from_iterable(one_word)) #this is another code I learned
#get result
#find two-word
#get result
#combine
#sorted by value
#export to a new CSV file
My problem is how to get value, i.e. 180..., which means I need to match the word, then get 'Number' and sum them all?
Note: the location is useless, it's just a coding practice.
Updated:
Maybe make 2 lists, i.e. one-word and two-word, then zip
them
CodePudding user response:
Looking at your expected result, I'm not sure how you get:
'Andrew Platt', 100
'Platt Andrew', 50
I see "Andrew Platt" and "Platt Andrew" in the first row, but both two-word combos should have the same value of 100, yes?
import csv
from collections import Counter
from itertools import combinations
from pprint import pprint
one_words = Counter()
two_words = Counter()
with open("input.csv", newline="") as f:
reader = csv.DictReader(f)
for row in reader:
items = row["Name"].split(" ")
# Unique one-word
for item in set(items):
one_words[item] = int(row["Number"])
for two_word in combinations(items, 2):
# Skip combos like [Andrew Andrew]
if len(set(two_word)) == 1:
continue
two_words[" ".join(two_word)] = int(row["Number"])
pprint(one_words)
pprint(two_words)
I got:
Counter({'Andrew': 180,
'Platt': 100,
'Steven': 80,
'England': 60,
'Jeff': 60,
'Thunder': 50})
Counter({'Andrew Platt': 100,
'Platt Andrew': 100,
'Steven Thunder': 50,
'Steven Andrew': 50,
'Thunder Andrew': 50,
'Jeff England': 30,
'Jeff Steven': 30,
'England Steven': 30,
'Andrew England': 30,
'Andrew Jeff': 30,
'England Jeff': 30})
CodePudding user response:
You need to get the unique names, and find combinations of two names. Then you can find if each name (1 or 2 words) is included in the first column.
import pandas as pd
import numpy as np
import itertools
#this is your data
df = pd.DataFrame([['Andrew Platt Andrew', 'Steven Thunder Andrew', 'Jeff England Steven',
'Andrew England Jeff'], [100,50,30,30]] ).transpose()
df.columns = ['names','x']
#get the unique names that appear in the columns
names = df.names.apply(lambda x : x.split(' '))
one_words = np.unique(names.sum())
#get all combinations of two names
two_words = [a ' ' b for a,b in itertools.combinations(one_words, 2)]
#fill the dictionnaries with the values
d_1 = {w : df.loc[df.names.str.contains(w),'x'].sum() for w in one_words}
d_2 = {w : df.loc[df.names.str.contains(w),'x'].sum() for w in two_words}
d = d_1 | d_2 #merge the disctionnaries
The output :
{'Andrew': 180,
'England': 60,
'Jeff': 60,
'Platt': 100,
'Steven': 80,
'Thunder': 50,
'Andrew England': 30,
'Andrew Jeff': 0,
'Andrew Platt': 100,
'Andrew Steven': 0,
'Andrew Thunder': 0,
'England Jeff': 30,
'England Platt': 0,
'England Steven': 30,
'England Thunder': 0,
'Jeff Platt': 0,
'Jeff Steven': 0,
'Jeff Thunder': 0,
'Platt Steven': 0,
'Platt Thunder': 0,
'Steven Thunder': 50}