Home > Back-end >  creating dataframe from a function
creating dataframe from a function

Time:01-02

I have created the following function to create a Dataframe from arguments

gamle_dfs = []
def create_lines_df_2(Origin, Destination, line_, nodes_):
dict_ = [{'Origin':Origin,'Destination':Destination,'geometry':line_,
                 'length':line_.length,
               'osmid':[nodes_.index.values]}]
df = gpd.GeoDataFrame(dict_, geometry='geometry', 
crs=oslo_edges_proj.crs).reset_index()
gamle_dfs.append(df)

I will use this function exactly 289 times to have 17 Dataframes 1 for each district routes, but the function returns each Dataframe as an element of the list and I want them as a one Dataframe, and if I changed the list to a GeoDataframe it will give me an empty Dataframe,

the result is like so:

    [       Origin  Destination                                           geometry  \
 0  Gamle Oslo  Grünerløkka  LINESTRING (599408.712 6642638.038, 599353.853...   
 
         length                                              osmid  
 0  1960.743326  [[1485390119, 79624, 1485390291, 24935363, 345...  ,
        Origin Destination                                           geometry  \
 0  Gamle Oslo      Sagene  LINESTRING (599408.712 6642638.038, 599353.853...   
 
         length                                              osmid  
 0  3799.280637  [[1485390119, 79624, 1485390291, 24935363, 345...  ]

and I can access each Dataframe by using gamle_dfs[0,.,.,n]

what is the solution to get the output as a Dataframe appended by the function?

Edit adding example:

origin = ['a']
destinations = ['b','c','d','e']
line1 = ['shaprely.geometry.nodes from a to b']
line2 = ['shaprely.geometry.nodes from a to c']
line3 = ['shaprely.geometry.nodes from a to d']
line4 = ['shaprely.geometry.nodes from a to e']


gamle_dfs = []

def create_lines_df_2test(Origin, Destination, line_):
    dict_ = 
    [{'Origin':Origin,'Destination':Destination,'geometry':line_,
    'length':len(line_)}]
    df = pd.DataFrame(dict_)
    gamle_dfs.append(df)

and this gives me a list of Dataframes when I need only 1 combined from those gamle_dfs indices

CodePudding user response:

If you really need to generate the dataframe in a loop, I would modify the function to output a dataframe, not to update a global variable. Then I would use pandas.concat to generate the final dataframe:

def create_lines_df_2test(Origin, Destination, line_):
    dict_ = [{'Origin':Origin,'Destination':Destination,'geometry':line_,
    'length':len(line_)}]
    df = pd.DataFrame(dict_)
    return df
    
lines = (line1, line2, line3, line4)
    
pd.concat([create_lines_df_2test(origin, destinations, l) for l in lines])

If you have all the data from the beginning, just generate the dataframe directly:

df = pd.DataFrame({'Origin': [origin for x in range(len(lines))],
                   'Destination': [destinations for x in range(len(lines))],
                   'geometry': lines,
                   'length': map(len, lines),
                   })

output:

  Origin   Destination                               geometry  length
0    [a]  [b, c, d, e]  [shaprely.geometry.nodes from a to b]       1
1    [a]  [b, c, d, e]  [shaprely.geometry.nodes from a to c]       1
2    [a]  [b, c, d, e]  [shaprely.geometry.nodes from a to d]       1
3    [a]  [b, c, d, e]  [shaprely.geometry.nodes from a to e]       1

  • Related