Write a function that takes one row and returns a list of 2-dimension tuples: song title and points-CodePudding

I need to preprocess some data so that I can start analyzing it. I currently have a data frame which contains data of Eurovision winners. I need to create a new data frame which contains the words from each of the songs, with the points of each song assigned to each word in a tuple. For example, if the song name is 'Hello World' and the score is 31, I need to create two tuples (Hello, 31) and (World, 31) and add them to a list from which I can create a new data frame.

Sample input

Here is the first row of my dataframe.

Sample Output

The output I want from the first row is

[('Net', 31),('als', 31),('toen', 31)]

Attempt

def TupleGenerator(row):
    list =[]
    for item in ev['Song']: 
        tuple = (item, ev["Points"])
        list.append(tuple)
    return list
 

TupleGenerator(ev.iloc[0])

This is what I have tried so far, but I am not sure how to get the score from the same row to be assigned to the word in the tuple.

Any advice is appreciated, thank you.

CodePudding user response：

You have the right idea, only right now you are iterating over every character in the string row["Song"]. You need to split this string up into a sequence of substrings where each substring represents a word from the song. Then iterate over this sequence. This code shows how one might do that

def TupleGenerator(row):
    result = []
    for word in row["Song"].strip('"').split():
        result.append((word, row["Points"]))
    return result

The strip method of strings accepts one optional argument that is a string that specifies the set of characters to be removed. In our case, we need to remove ". The split method without any arguments returns a list of the words in the string, using consecutive whitespace string subsequences as the delimiter.

For example, if your df is

df = pd.DataFrame(
    {"Year": 1957,
     "Date": "3-Mar",
     "Host City": ["Frankfurt", "Linux"],
     "Winner": ["Netherlands", "Unix"],
     "Song": ['"Net als toen"', '"git hub"'],
     "Performer": ["Corry Brokken", "Stack Overflow"],
     "Points": [31, 32],
     "Margin": [14, 15],
     "Runner-up": ["France", "cyberspace"]
    }
)

Running

for index, row in df.iterrows():
    print(TupleGenerator(row))

gives output

[('Net', 31), ('als', 31), ('toen', 31)]
[('git', 32), ('hub', 32)]

I hope this helps. Let me know if there are any questions!