Home > Enterprise >  Write a function that takes one row and returns a list of 2-dimension tuples: song title and points
Write a function that takes one row and returns a list of 2-dimension tuples: song title and points

Time:04-11

I need to preprocess some data so that I can start analyzing it. I currently have a data frame which contains data of Eurovision winners. I need to create a new data frame which contains the words from each of the songs, with the points of each song assigned to each word in a tuple. For example, if the song name is 'Hello World' and the score is 31, I need to create two tuples (Hello, 31) and (World, 31) and add them to a list from which I can create a new data frame.

Sample input

Here is the first row of my dataframe.

Sample Output

The output I want from the first row is

[('Net', 31),('als', 31),('toen', 31)]

Attempt

def TupleGenerator(row):
    list =[]
    for item in ev['Song']: 
        tuple = (item, ev["Points"])
        list.append(tuple)
    return list
 

TupleGenerator(ev.iloc[0])

This is what I have tried so far, but I am not sure how to get the score from the same row to be assigned to the word in the tuple.

Any advice is appreciated, thank you.

CodePudding user response:

You have the right idea, only right now you are iterating over every character in the string row["Song"]. You need to split this string up into a sequence of substrings where each substring represents a word from the song. Then iterate over this sequence. This code shows how one might do that

def TupleGenerator(row):
    result = []
    for word in row["Song"].strip('"').split():
        result.append((word, row["Points"]))
    return result 

The strip method of strings accepts one optional argument that is a string that specifies the set of characters to be removed. In our case, we need to remove ". The split method without any arguments returns a list of the words in the string, using consecutive whitespace string subsequences as the delimiter.

For example, if your df is

df = pd.DataFrame(
    {"Year": 1957,
     "Date": "3-Mar",
     "Host City": ["Frankfurt", "Linux"],
     "Winner": ["Netherlands", "Unix"],
     "Song": ['"Net als toen"', '"git hub"'],
     "Performer": ["Corry Brokken", "Stack Overflow"],
     "Points": [31, 32],
     "Margin": [14, 15],
     "Runner-up": ["France", "cyberspace"]
    }
)

Running

for index, row in df.iterrows():
    print(TupleGenerator(row))

gives output

[('Net', 31), ('als', 31), ('toen', 31)]
[('git', 32), ('hub', 32)]

I hope this helps. Let me know if there are any questions!

  • Related