pandas read tree like data into one column-CodePudding

I got a dataframe like this

    Country State   City    District    GeoID
0   US  NaN NaN NaN 1000000
1   US  California  NaN NaN 1010000
2   US  California  San Francisco   NaN 1011000
3   US  California  San Francisco   Hayes Valley    1011001
4   US  California  San Francisco   South of Market 1011002
5   US  California  San Francisco   Potrero Hill    1011003
6   US  California  San Francisco   Russian Hill    1011004
7   US  California  San Francisco   Noe Valley  1011005
8   US  California  San Diego   NaN 1012000
9   US  California  San Diego   Communities of Clairemont   1012001
10  US  California  San Diego   Linda Vista 1012002
11  US  California  San Diego   Chollas View    1012003
12  US  California  San Diego   North Bay Terrace   1012004
13  US  California  San Diego   Valencia Park   1012005
14  US  California  San Diego   Lomita Village  1012006
15  US  California  Los Angeles NaN 1013000

I want to match GeoID with related info, if there are district match it first, then match city,state, and country.

Something like

for i in range(len(df["GeoID"])):
    if df["District"]==None:
      if df["City"]==None:
        if df["State"]==None:
            df["newGeo"]=df["Country"]
        else:
            df["newGeo"]=df["State"]
      else:
          df["newGeo"]=df["City"]
    else:
        df["newGeo"]=df["City"]

Desired output

GeoID newGeo
1000000 US
1010000 California
1011000 San Francisco
1011001 Hayes Valley
1011002 South of Market
1011003 Potrero Hill
1011004 Russia Hill
...

CodePudding user response：

Would it be something like the following?

df['GeoID'] = df.fillna(method='ffill', axis=1).iloc[:,-2]

fillna with the method ffill propagates non-null values foward, and iloc will select the desired column. Let's say you have:

index	0	1	2	3
0	Country	NaN	NaN	101
1	Country	State 2	NaN	1101
2	Country	State 3	NaN	10101
3	Country	State 4	City 4	101000
4	Country	State 5	City 5	1011

Using df.fillna(method='ffill', axis=1) will get you the following:

index	0	1	2	3
0	Country	Country	Country	101
1	Country	State 2	State 2	1101
2	Country	State 3	State 3	10101
3	Country	State 4	City 4	101000
4	Country	State 5	City 5	1011

As you can see, it pushes the values that are not null forward. You can simply select that second to last column with .iloc[:,-2] and assign it to a new column with the full code above. The full result is:

index	0	1	2	3	GeoID
0	Country	NaN	NaN	101	Country
1	Country	State 2	NaN	1101	State 2
2	Country	State 3	NaN	10101	State 3
3	Country	State 4	City 4	101000	City 4
4	Country	State 5	City 5	1011	City 5

CodePudding user response：

Something like this can help your scenario: It allows matching multiple column values 'and' in single statement

my_df.loc[ (my_df['my_column_1'] == my_condition_1) & (my_df['my_column_2'] == my_condition_2) & (my_df['my_column_3'] == my_condition_3)]

Please read about '.loc', '.iloc' in pandas https://www.adamsmith.haus/python/answers/how-to-filter-a-pandas-dataframe-by-multiple-columns-in-python