I currently have a dictionary where the values are:
disney_data = {
'title': ['Gus (1976)',
'Johnny Kapahala: Back on Board (2007)',
'The Adventures of Huck Finn (1993)',
'The Simpsons (1989)',
'Atlantis: Milo’s Return (2003)']
}
I would like to split up the title from the year value and have a dictionary like:
new_disney_data = {
'title' : ['Gus',
'Johnny Kapahala: Back on Board',
'The Adventures of Huck Finn',
'The Simpsons',
'Atlantis: Milo’s Return'],
'year' : ['1976',
'2007',
'1993',
'1989',
'2003']
}
I tried using the following, but I know something is off - I'm still relatively fresh to python so any help would be greatly apprecated!
for value in disney_data.values():
new_disney_data['title'].append(title[0,-7])
new_disney_data['year'].append(title[-7,-1])
CodePudding user response:
You're not that far off. In your for-loop you iterate over values of the dict, but you want to iterate over the titles. Also the string slicing syntax is [id1:id2]. So this would probably do what you are looking for:
new_disney_data = {"title":[], "year":[]}
for value in disney_data["title"]:
new_disney_data['title'].append(value[0:-7])
new_disney_data['year'].append(value[-5:-1])
CodePudding user response:
There are two concepts you can use here:
The first would be
.split()
. This usually works better than indexing in a string (in case someone placed a space after the brackets in the string, for example). Read more.The second would be comprehension. Read more.
Using these two, here is one possible solution.
titles = [item.split('(')[0].strip() for item in disney_data['title']]
years = [item.split('(')[1].split(')')[0].strip() for item in disney_data['title']]
new_disney_data = {
'title': titles,
'year': years
}
print(new_disney_data)
Edit: I also used .strip()
. This removes any trailing whitespace like spaces, tabs, or newlines from the ends of a string.
CodePudding user response:
new_disney_data = {
'title': [i[:-6].rstrip() for i in disney_data['title']],
'year': [i[-5:-1] for i in disney_data['title']]
}
CodePudding user response:
this code can do it
import re
disney_data = {
'title': ['Gus (1976)',
'Johnny Kapahala: Back on Board (2007)',
'The Adventures of Huck Finn (1993)',
'The Simpsons (1989)',
'Atlantis: Milo’s Return (2003)']
}
disney_data['year'] = []
for index,line in enumerate(disney_data.get('title')):
match = re.search(r'\d{4}', line)
if match is not None:
disney_data['title'][index] = line.split('(')[0].strip()
disney_data['year'].append(match.group())
print(disney_data)
it searches for every line in the title if there are 4 digits, if exists then add to year, and remove digits and parenthesis from the title.
CodePudding user response:
Something like this
disney_data = {
'title': ['Gus (1976)',
'Johnny Kapahala: Back on Board (2007)',
'The Adventures of Huck Finn (1993)',
'The Simpsons (1989)',
'Atlantis: Milo’s Return (2003)']
}
new_disney_data = {'title': [], 'year': []}
#split title into two columns title and year in new dict
for title in disney_data['title']:
new_disney_data['title'].append(title.split('(')[0]) #split title by '('
new_disney_data['year'].append(title.split('(')[1].split(')')[0]) #split year by ')'
print(disney_data)
print(new_disney_data)
CodePudding user response:
Using split and replace.
def split(data):
o = {'title' : [], 'year' : []}
for (t, y) in [d.replace(')','').split(' (') for d in data['title']]:
o['title'].append(t)
o['year'].append(y)
return o
Using Regular Expession
import re
def regex(data):
r = re.compile("(.*?) \((\d{4})\)")
o = {'title' : [], 'year' : []}
for (t, y) in [r.findall(d)[0] for d in data['title']]:
o['title'].append(t)
o['year'].append(y)
return o