How to Split a Dictionary Value into 2 Separate Key Values-CodePudding

I currently have a dictionary where the values are:

disney_data = {
    'title': ['Gus (1976)',
    'Johnny Kapahala: Back on Board (2007)',
    'The Adventures of Huck Finn (1993)',
    'The Simpsons (1989)',
    'Atlantis: Milo’s Return (2003)']
}

I would like to split up the title from the year value and have a dictionary like:

new_disney_data = {
    'title' : ['Gus',
    'Johnny Kapahala: Back on Board',
    'The Adventures of Huck Finn',
    'The Simpsons',
    'Atlantis: Milo’s Return'],
    'year' : ['1976',
    '2007',
    '1993',
    '1989',
    '2003']
}

I tried using the following, but I know something is off - I'm still relatively fresh to python so any help would be greatly apprecated!

for value in disney_data.values():
    new_disney_data['title'].append(title[0,-7])
    new_disney_data['year'].append(title[-7,-1])

CodePudding user response：

You're not that far off. In your for-loop you iterate over values of the dict, but you want to iterate over the titles. Also the string slicing syntax is [id1:id2]. So this would probably do what you are looking for:

new_disney_data = {"title":[], "year":[]}

for value in disney_data["title"]:
    new_disney_data['title'].append(value[0:-7])
    new_disney_data['year'].append(value[-5:-1])

CodePudding user response：

There are two concepts you can use here:

The first would be .split(). This usually works better than indexing in a string (in case someone placed a space after the brackets in the string, for example). Read more.
The second would be comprehension. Read more.

Using these two, here is one possible solution.

titles = [item.split('(')[0].strip() for item in disney_data['title']]

years = [item.split('(')[1].split(')')[0].strip() for item in disney_data['title']]

new_disney_data = {
    'title': titles,
    'year': years
}

print(new_disney_data)

Edit: I also used .strip(). This removes any trailing whitespace like spaces, tabs, or newlines from the ends of a string.

CodePudding user response：

new_disney_data = {
    'title': [i[:-6].rstrip() for i in disney_data['title']],
    'year': [i[-5:-1] for i in disney_data['title']]
}

CodePudding user response：

this code can do it

import re
disney_data = {
    'title': ['Gus (1976)',
              'Johnny Kapahala: Back on Board (2007)',
              'The Adventures of Huck Finn (1993)',
              'The Simpsons (1989)',
              'Atlantis: Milo’s Return (2003)']
}
disney_data['year'] = []
for index,line in enumerate(disney_data.get('title')):
    match = re.search(r'\d{4}', line)
    if match is not None:
        disney_data['title'][index] = line.split('(')[0].strip()
        disney_data['year'].append(match.group())


print(disney_data)

it searches for every line in the title if there are 4 digits, if exists then add to year, and remove digits and parenthesis from the title.

CodePudding user response：

Something like this

disney_data = {
    'title': ['Gus (1976)',
    'Johnny Kapahala: Back on Board (2007)',
    'The Adventures of Huck Finn (1993)',
    'The Simpsons (1989)',
    'Atlantis: Milo’s Return (2003)']    
}

new_disney_data = {'title': [], 'year': []}

#split title into two columns title and year in new dict
for title in disney_data['title']:
    new_disney_data['title'].append(title.split('(')[0]) #split title by '('
    new_disney_data['year'].append(title.split('(')[1].split(')')[0]) #split year by ')'

print(disney_data)
print(new_disney_data)

CodePudding user response：

Using split and replace.

def split(data):
    o = {'title' : [], 'year' : []}
    for (t, y) in [d.replace(')','').split(' (') for d in data['title']]:
        o['title'].append(t)
        o['year'].append(y) 
    return o

Using Regular Expession

import re

def regex(data):
    r = re.compile("(.*?) \((\d{4})\)")
    
    o = {'title' : [], 'year' : []}
    for (t, y) in [r.findall(d)[0] for d in data['title']]:
        o['title'].append(t)
        o['year'].append(y) 
    return o