Home > Net >  replacing df.append with pd.concat when building a new dataframe from file read
replacing df.append with pd.concat when building a new dataframe from file read

Time:12-06

...

header = pd.DataFrame() 


for x in {0,7,8,9,10,11,12,13,14,15,18,19,21,23}:
    header = header.append({'col1':data1[x].split(':')[0],
                            'col2':data1[x].split(':')[1][:-1],
                            'col3':data2[x].split(':')[1][:-1],
                            'col4':data2[x]==data1[x],
                            'col5':'---'},
                          ignore_index=True)`

...

I have some Jupyter Notebook code which reads in 2 text files to data1 and data2 and using a list I am picking out specific matching lines in both files to a dataframe for easy display and comparison in the notebook

Since df.append is now being bumped for pd.concat what's the tidiest way to do this

is it basically to replace the inner loop code with ...

header = pd.concat(header, {all the column code from above })

...

addtional input to comment below Yes, sorry for example the next block of code does this:

for x in {4,2 5}:
    header = header.append({'col1':SOMENEWROWNAME'',
                            'col2':data1[x].split(':')[1][:-1],
                            'col3':data2[x].split(':')[1][:-1],
                            'col4':data2[x]==data1[x],
                            'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1])
                          ignore_index=True)`

repeated 5 times with different data indices in the loop, and then a different SOMENEWROWNAME

I inherited this notebook and I see now that this way of doing it was because they only wanted to do a numerical float difference on the columns where numbers come

but there are several such blocks, with different lines in the data and where that first parameter SOMENEWROWNAME is the different text fields from the respective lines in the data.

so I was primarily just trying to fix these append to concat warnings, but of course if the code can be better written then all good!

CodePudding user response:

Use list comprehension and DataFrame constructor:

data = [{'col1':data1[x].split(':')[0],
         'col2':data1[x].split(':')[1][:-1],
         'col3':data2[x].split(':')[1][:-1],
         'col4':data2[x]==data1[x],
         'col5':'---'} for x in {0,7,8,9,10,11,12,13,14,15,18,19,21,23}]
df = pd.DataFrame(data)

EDIT:

out = []
#sample
for x in {1,7,30}:
    out.append({'col1':SOMENEWROWNAME'',
                            'col2':data1[x].split(':')[1][:-1],
                            'col3':data2[x].split(':')[1][:-1],
                            'col4':data2[x]==data1[x],
                            'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1]))))))

df1 = pd.DataFrame(out)

out1 = []
#sample
for x in {1,7,30}:
    out1.append({another dict})))

 df2 = pd.DataFrame(out1)

 df = pd.concat([df1, df2])

Or:

final = []
for x in {4,2,5}:
    final.append({'col1':SOMENEWROWNAME'',
                            'col2':data1[x].split(':')[1][:-1],
                            'col3':data2[x].split(':')[1][:-1],
                            'col4':data2[x]==data1[x],
                            'col5':float(data2[x].split(':'},[1]([-1]) -float(data1[x].split(':'},[1]([-1]))))))


for x in {4,2, 5}:
    final.append({another dict})))

 df = pd.DataFrame(final)
  • Related