Is there an systematic approach do import copied dataframes from Stackoverflow Questions into your programm?
I often see Dataframes similar like the example below. But i usually have to put in some ":"
or some other formatting to turn it into a proper pandas Dataframe via pd.read_cvs
.
So my questions are:
- Am i missing something here or is everyone else also just trying out 2-3 things per copied Dataframe until its "clean"?
- The other way around. Is there a recommended format to copy your example Dataframe into a Stackoverflow question?
Name id other list
0 bren {00005, 0002,0003} abc [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1 frenn {00006,0001} gf [3000, B, 80], [7000, R, 98]
2 kylie {00007} jgj [600, C, 55]
3 juke {00009} gg [5000, D, 88]
Usually i create a dummyfile and copy/paste the provided Dataframe from Stackoverflow. In the dummyfile i replace the whitespaces with ":". Finally i use:
import pandas as pd
df=pd.read_cvs("dummyfile", sep=":")
Otherwise i get Problems like shown below
print(df.columns) #output " list"
Name ... list
0 ben {00005, 0002,0003} abc [[1000, A, 90],[9000, S, ... 48]]
1 alex {00006,0001} gf [3000, B, 80], [7000, R, ... None
2 linn {00007} jgj [600, C, 55] NaN None ... None
3 luke {00009} gg [5000, D, 88] NaN None ... None
I expect a clean dataframe.
CodePudding user response:
You can avoid creating a dummy file by using io.StringIO
. Also you can specify the separator using regex: \s{2,}
will separate where two or more whitespace characters are found. Finally, if the example contains an index column, add "ID" (for instance) to the string header and specify index_col
in read_csv
:
import pandas as pd
from io import StringIO
# added ID
# ||
# \/
data = StringIO("""ID Name id other list
0 bren {00005, 0002,0003} abc [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1 frenn {00006,0001} gf [3000, B, 80], [7000, R, 98]
2 kylie {00007} jgj [600, C, 55]
3 juke {00009} gg [5000, D, 88]"""
)
print(pd.read_csv(data, sep="\s{2,}", engine="python", index_col="ID"))
Output:
Name id other list
ID
0 bren {00005, 0002,0003} abc [[1000, A, 90],[9000, S, 28],[5000, T, 48]]
1 frenn {00006,0001} gf [3000, B, 80], [7000, R, 98]
2 kylie {00007} jgj [600, C, 55]
3 juke {00009} gg [5000, D, 88]
Of course, people asking questions on SO should rather provide a proper constructor to ensure reproducibility.