Home > Back-end >  How to add list of tuples in a for loop to data frame where each tuple object is in its own column?
How to add list of tuples in a for loop to data frame where each tuple object is in its own column?

Time:07-28

I have a list of tuples of some Wikipedia data that I am scraping. I can get it in a dataframe but its all in 1 column I need it broke out into 4 columns to hold each tuple object.

results = wikipedia.search('Kalim_Aajiz')

df = pd.DataFrame()
data = []
for i in results:
  wiki_page = wikipedia.page(i)
  data = wiki_page.title, wiki_page.url, wiki_page.summary, wiki_page.pageid
  dataList = list(data)
  print(dataList)
  df = df.append(dataList)

DATA RESULTS:

0   Kalim Aajiz
1   https://en.wikipedia.org/wiki/Kalim_Aajiz
2   Kalim Aajiz (1920 – 14 February 2015) was an I...
3   47137025
0   Robert Thurman
1   https://en.wikipedia.org/wiki/Robert_Thurman
2   Robert Alexander Farrar Thurman (born August 3...
3   475367
0   Ruskin Bond
1   https://en.wikipedia.org/wiki/Ruskin_Bond
2   Ruskin Bond (born 19 May 1934) is an Anglo Ind...
3   965456
0   Haldhar Nag

EXPECTED RESULTS:

NAME        | URL                                      | DESCRIPTION | ID

Kalim Aajiz  https://en.wikipedia.org/wiki/Kalim_Aajiz  was an I...    47137025

CodePudding user response:

Format it into a list of dictionaries, and then make a DataFrame at the end.

results = wikipedia.search('Kalim_Aajiz')

data_list = []
for i in results:
    wiki_page = wikipedia.page(i)
    data = {'title': wiki_page.title,
            'url': wiki_page.url,
            'summary': wiki_page.summary, 
            'pageid': wiki_page.pageid}
    data_list.append(data)

df = pd.DataFrame(data_list)
df

Output:

              title                                             url                                            summary    pageid
0       Kalim Aajiz       https://en.wikipedia.org/wiki/Kalim_Aajiz  Kalim Aajiz (1920 – 14 February 2015) was an I...  47137025
1    Robert Thurman    https://en.wikipedia.org/wiki/Robert_Thurman  Robert Alexander Farrar Thurman (born August 3...    475367
2       Ruskin Bond       https://en.wikipedia.org/wiki/Ruskin_Bond  Ruskin Bond (born 19 May 1934) is an Anglo Ind...    965456
3       Haldhar Nag       https://en.wikipedia.org/wiki/Haldhar_Nag  Dr. Haldhar Nag (born 31 March 1950) is a Samb...  29466145
4     Sucheta Dalal     https://en.wikipedia.org/wiki/Sucheta_Dalal  Sucheta Dalal (born 1962) is an Indian busines...   4125323
5        Padma Shri        https://en.wikipedia.org/wiki/Padma_Shri  Padma Shri (IAST: padma śrī), also spelled Pad...    442893
6        Vairamuthu        https://en.wikipedia.org/wiki/Vairamuthu  Vairamuthu Ramasamy (born 13 July 1953) is an ...   3604328
7          Sal Khan          https://en.wikipedia.org/wiki/Sal_Khan  Salman Amin Khan (born October 11, 1976), comm...  26464673
8      Arvind Gupta      https://en.wikipedia.org/wiki/Arvind_Gupta  Arvind Gupta is an Indian toy inventor and exp...  29176509
9  Rajdeep Sardesai  https://en.wikipedia.org/wiki/Rajdeep_Sardesai  Rajdeep Sardesai (born 24 May 1965)is an India...   1673653

CodePudding user response:

You could just build a dictionary with your for loop and then create the data frame at the end.

For example:

results = wikipedia.search('Kalim_Aajiz')
data1 = {"NAME": [], "URL": [], "DESCRIPTION": [], "ID": []}
for i in results:
  wiki_page = wikipedia.page(i)
  data2 = wiki_page.title, wiki_page.url, wiki_page.summary, wiki_page.pageid
  for key, value in zip(data1.keys(), data2):
      data1[key].append(value)

df = pd.DataFrame(data)

CodePudding user response:

You could set a grouped index value that would allow a pivot. Specifically np.arange(len(df))//4. Using the current index 0,1,2,3,0,1,2,3... to identify the columns for the pivot.

dfp = (
    df.reset_index().assign(s=np.arange(len(df))//4).pivot(index=['s'], columns=[0])
        .droplevel(0, axis=1).rename_axis(None, axis=1).rename_axis(None, axis=0)
)

dfp.columns = ['NAME','URL','DESCRIPTION','ID']

print(dfp)

Result

             NAME                                           URL                                        DESCRIPTION        ID 
0     Kalim Aajiz     https://en.wikipedia.org/wiki/Kalim_Aajiz  Kalim Aajiz (1920 – 14 February 2015) was an I...  47137025  
1  Robert Thurman  https://en.wikipedia.org/wiki/Robert_Thurman  Robert Alexander Farrar Thurman (born August 3...    475367  
2     Ruskin Bond     https://en.wikipedia.org/wiki/Ruskin_Bond  Ruskin Bond (born 19 May 1934) is an Anglo Ind...    965456  

CodePudding user response:

I don't think you need data at all, you could directly use the attributes of wiki_page:

df = pd.DataFrame(columns=["NAME", "URL", "DESCRIPTION", "ID"])

for i in results:
    wiki_page = wikipedia.page(i)
    df.loc[len(df.index)] = wiki_page.title, wiki_page.url, wiki_page.summary, wiki_page.pageid  

or with pd.concat(), as pd.append() is deprecated:

for i in results:
    wiki_page = wikipedia.page(i)
    df = pd.concat([
        df,
        pd.DataFrame([wiki_page.title, wiki_page.url, wiki_page.summary, wiki_page.pageid],
            columns=["NAME", "URL", "DESCRIPTION", "ID"])
    ], ignore_index=True)
  • Related