Home > Blockchain >  How to copy Stackoverflow example Dataframe into a pandas Dataframe for reproduction
How to copy Stackoverflow example Dataframe into a pandas Dataframe for reproduction

Time:02-05

Is there an systematic approach do import copied dataframes from Stackoverflow Questions into your programm? I often see Dataframes similar like the example below. But i usually have to put in some ":" or some other formatting to turn it into a proper pandas Dataframe via pd.read_cvs.

So my questions are:

  • Am i missing something here or is everyone else also just trying out 2-3 things per copied Dataframe until its "clean"?
  • The other way around. Is there a recommended format to copy your example Dataframe into a Stackoverflow question?
   Name    id                   other           list
0  bren   {00005, 0002,0003}     abc      [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1  frenn  {00006,0001}           gf       [3000, B, 80], [7000, R, 98]
2  kylie  {00007}                jgj      [600, C, 55]
3  juke   {00009}                gg       [5000, D, 88]

Usually i create a dummyfile and copy/paste the provided Dataframe from Stackoverflow. In the dummyfile i replace the whitespaces with ":". Finally i use:

import pandas as pd
df=pd.read_cvs("dummyfile", sep=":")

Otherwise i get Problems like shown below

print(df.columns) #output "               list"
                                                               Name  ...  list
0 ben  {00005,      0002,0003} abc    [[1000, A,   90],[9000,    S,  ...  48]]
1 alex {00006,0001} gf         [3000, B,      80], [7000,        R,  ...  None
2 linn {00007}      jgj        [600,  C,      55]  NaN         None  ...  None
3 luke {00009}      gg         [5000, D,      88]  NaN         None  ...  None

I expect a clean dataframe.

CodePudding user response:

You can avoid creating a dummy file by using io.StringIO. Also you can specify the separator using regex: \s{2,} will separate where two or more whitespace characters are found. Finally, if the example contains an index column, add "ID" (for instance) to the string header and specify index_col in read_csv:

import pandas as pd
from io import StringIO

                 #  added ID
                 #  ||
                 #  \/
data  = StringIO("""ID    Name    id                   other           list
0  bren   {00005, 0002,0003}     abc      [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1  frenn  {00006,0001}           gf       [3000, B, 80], [7000, R, 98]
2  kylie  {00007}                jgj      [600, C, 55]
3  juke   {00009}                gg       [5000, D, 88]"""
)

print(pd.read_csv(data, sep="\s{2,}", engine="python", index_col="ID"))

Output:

     Name                  id other                                         list
ID                                                                              
0    bren  {00005, 0002,0003}   abc  [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1   frenn        {00006,0001}    gf                 [3000, B, 80], [7000, R, 98]
2   kylie             {00007}   jgj                                 [600, C, 55]
3    juke             {00009}    gg                                [5000, D, 88]

Of course, people asking questions on SO should rather provide a proper constructor to ensure reproducibility.

  • Related