Home > Blockchain >  pandas read tree like data into one column
pandas read tree like data into one column

Time:04-08

I got a dataframe like this

    Country State   City    District    GeoID
0   US  NaN NaN NaN 1000000
1   US  California  NaN NaN 1010000
2   US  California  San Francisco   NaN 1011000
3   US  California  San Francisco   Hayes Valley    1011001
4   US  California  San Francisco   South of Market 1011002
5   US  California  San Francisco   Potrero Hill    1011003
6   US  California  San Francisco   Russian Hill    1011004
7   US  California  San Francisco   Noe Valley  1011005
8   US  California  San Diego   NaN 1012000
9   US  California  San Diego   Communities of Clairemont   1012001
10  US  California  San Diego   Linda Vista 1012002
11  US  California  San Diego   Chollas View    1012003
12  US  California  San Diego   North Bay Terrace   1012004
13  US  California  San Diego   Valencia Park   1012005
14  US  California  San Diego   Lomita Village  1012006
15  US  California  Los Angeles NaN 1013000

I want to match GeoID with related info, if there are district match it first, then match city,state, and country.

Something like

for i in range(len(df["GeoID"])):
    if df["District"]==None:
      if df["City"]==None:
        if df["State"]==None:
            df["newGeo"]=df["Country"]
        else:
            df["newGeo"]=df["State"]
      else:
          df["newGeo"]=df["City"]
    else:
        df["newGeo"]=df["City"]

Desired output

GeoID newGeo
1000000 US
1010000 California
1011000 San Francisco
1011001 Hayes Valley
1011002 South of Market
1011003 Potrero Hill
1011004 Russia Hill
...

CodePudding user response:

Would it be something like the following?

df['GeoID'] = df.fillna(method='ffill', axis=1).iloc[:,-2]

fillna with the method ffill propagates non-null values foward, and iloc will select the desired column. Let's say you have:

index 0 1 2 3
0 Country NaN NaN 101
1 Country State 2 NaN 1101
2 Country State 3 NaN 10101
3 Country State 4 City 4 101000
4 Country State 5 City 5 1011

Using df.fillna(method='ffill', axis=1) will get you the following:

index 0 1 2 3
0 Country Country Country 101
1 Country State 2 State 2 1101
2 Country State 3 State 3 10101
3 Country State 4 City 4 101000
4 Country State 5 City 5 1011

As you can see, it pushes the values that are not null forward. You can simply select that second to last column with .iloc[:,-2] and assign it to a new column with the full code above. The full result is:

index 0 1 2 3 GeoID
0 Country NaN NaN 101 Country
1 Country State 2 NaN 1101 State 2
2 Country State 3 NaN 10101 State 3
3 Country State 4 City 4 101000 City 4
4 Country State 5 City 5 1011 City 5

CodePudding user response:

Something like this can help your scenario: It allows matching multiple column values 'and' in single statement

my_df.loc[ (my_df['my_column_1'] == my_condition_1) & (my_df['my_column_2'] == my_condition_2) & (my_df['my_column_3'] == my_condition_3)] 

Please read about '.loc', '.iloc' in pandas https://www.adamsmith.haus/python/answers/how-to-filter-a-pandas-dataframe-by-multiple-columns-in-python

  • Related