I'm new to Python. Currently, extracting Music Genres from Google. Using Python, selenium, and pandas
i.e) I extracted a genre from
https://www.google.com/search?q=Anything I Want genre
The music has genres of 'Pop music', 'Pop rock', and 'Pop'
Now, I'm writing it into an excel file.
if i % 10 == 0: song_excel.to_excel('./Song_Genre.xlsx',index=False)
However, currently, it wrote in the excel like this: Pop musicPop rockPop
When I double click the excel's cell, the cell automatically separates into multiple rows of words such as;
Pop music
Pop rock
Pop
From the code, the result shows as below
print("genre: {}".format(genre))
genre :
Pop music
Pop rock
Pop
How do I write extracted words to excel in series in one cell?: (Goal) (with space, and with comma) => Pop music, Pop rock, Pop
--- adding some codes
start_row = 2 song_excel = pd.read_excel('./Song_Genre.xlsx')
for i in tqdm.tqdm(range(start_row,len(song_excel))):
if song_excel.iloc[i,2] != 0: pass elif song_excel.iloc[i,2] == 0: song_name = song_excel.iloc[i,0] url = song_excel.iloc[i,1] driver.get(url) genre = None try: genre = driver.find_element(by = By.XPATH, value = "/html/body//*[contains(concat(' ', @class, ' '), ' KKHQ8c ')]").text except selenium.common.exceptions.NoSuchElementException: if genre != None: song_excel.iloc[i,2] = genre pass if i % 10 == 0: song_excel.to_excel('./Song_Genre.xlsx',index=False)
Best,
CodePudding user response:
It seems that the string contains some '\n' which is interpreted by excel as newline. So you need to replace the value with desired value. Use:
gen = '''Pop music
Pop rock
Pop'''
# Solution
gen = gen.replace('\n', ', ')
temp=pd.DataFrame({'test':[gen]})
Note that without replacing, the excel behaves as you mentioned.