I got a dataframe like this
Country State City District GeoID
0 US NaN NaN NaN 1000000
1 US California NaN NaN 1010000
2 US California San Francisco NaN 1011000
3 US California San Francisco Hayes Valley 1011001
4 US California San Francisco South of Market 1011002
5 US California San Francisco Potrero Hill 1011003
6 US California San Francisco Russian Hill 1011004
7 US California San Francisco Noe Valley 1011005
8 US California San Diego NaN 1012000
9 US California San Diego Communities of Clairemont 1012001
10 US California San Diego Linda Vista 1012002
11 US California San Diego Chollas View 1012003
12 US California San Diego North Bay Terrace 1012004
13 US California San Diego Valencia Park 1012005
14 US California San Diego Lomita Village 1012006
15 US California Los Angeles NaN 1013000
I want to match GeoID with related info, if there are district match it first, then match city,state, and country.
Something like
for i in range(len(df["GeoID"])):
if df["District"]==None:
if df["City"]==None:
if df["State"]==None:
df["newGeo"]=df["Country"]
else:
df["newGeo"]=df["State"]
else:
df["newGeo"]=df["City"]
else:
df["newGeo"]=df["City"]
Desired output
GeoID newGeo
1000000 US
1010000 California
1011000 San Francisco
1011001 Hayes Valley
1011002 South of Market
1011003 Potrero Hill
1011004 Russia Hill
...
CodePudding user response:
Would it be something like the following?
df['GeoID'] = df.fillna(method='ffill', axis=1).iloc[:,-2]
fillna with the method ffill propagates non-null values foward, and iloc will select the desired column. Let's say you have:
index | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | Country | NaN | NaN | 101 |
1 | Country | State 2 | NaN | 1101 |
2 | Country | State 3 | NaN | 10101 |
3 | Country | State 4 | City 4 | 101000 |
4 | Country | State 5 | City 5 | 1011 |
Using df.fillna(method='ffill', axis=1)
will get you the following:
index | 0 | 1 | 2 | 3 |
---|---|---|---|---|
0 | Country | Country | Country | 101 |
1 | Country | State 2 | State 2 | 1101 |
2 | Country | State 3 | State 3 | 10101 |
3 | Country | State 4 | City 4 | 101000 |
4 | Country | State 5 | City 5 | 1011 |
As you can see, it pushes the values that are not null forward. You can simply select that second to last column with .iloc[:,-2]
and assign it to a new column with the full code above. The full result is:
index | 0 | 1 | 2 | 3 | GeoID |
---|---|---|---|---|---|
0 | Country | NaN | NaN | 101 | Country |
1 | Country | State 2 | NaN | 1101 | State 2 |
2 | Country | State 3 | NaN | 10101 | State 3 |
3 | Country | State 4 | City 4 | 101000 | City 4 |
4 | Country | State 5 | City 5 | 1011 | City 5 |
CodePudding user response:
Something like this can help your scenario: It allows matching multiple column values 'and' in single statement
my_df.loc[ (my_df['my_column_1'] == my_condition_1) & (my_df['my_column_2'] == my_condition_2) & (my_df['my_column_3'] == my_condition_3)]
Please read about '.loc', '.iloc' in pandas https://www.adamsmith.haus/python/answers/how-to-filter-a-pandas-dataframe-by-multiple-columns-in-python