I have a dataframe like this one below
Air Station Code Humidity Temperature Latitude Longitude
St.1 20 10 10.00 10.00
St.2 4 15 25.00 30.00
St.3 16 21 8.00 15.00
St.4 38 8 31.00 40.00
St.5 10 18 10.00 10.00
St.6 40 4 25.00 30.00
St.7 10 13 8.00 15.00
St.8 46 11 31.00 40.00
St.9 28 9 10.00 10.00
St.10 14 22 25.00 30.00
St.11 5 40 8.00 15.00
St.12 11 10 31.00 40.00
...
St.89 61 35 10.00 10.00
St.90 23 29 25.00 30.00
St.91 35 12 8.00 15.00
St.92 31 7 31.00 40.00
I want to change the station codes by matching the coordinates, substituing the codes by repeating the first 4 codes, obtaining this
Air Station Code Humidity Temperature Latitude Longitude
St.1 20 10 10.00 10.00
St.2 4 15 25.00 30.00
St.3 16 21 8.00 15.00
St.4 38 8 31.00 40.00
St.1 10 18 10.00 10.00
St.2 40 4 25.00 30.00
St.3 10 13 8.00 15.00
St.4 46 11 31.00 40.00
St.1 28 9 10.00 10.00
St.2 14 22 25.00 30.00
St.3 5 40 8.00 15.00
St.4 11 10 31.00 40.00
...
St.1 61 35 10.00 10.00
St.2 23 29 25.00 30.00
St.3 35 12 8.00 15.00
St.4 31 7 31.00 40.00
Is there some way to implement an "if/else" substitution on the whole dataframe without going manually over every observation in python?
CodePudding user response:
There may be a better way to do this, but this solves the problem. I create a second dataframe without the duplicates, which keeps the first occurrence of each lat/long. I make lat/long the index and drop the other columns. I then do a "join", adding a new column with the matching lat/long. I then overwrite the original station code with the looked up one.
import pandas as pd
data = [
["St.1", 20, 10, 10.00, 10.00],
["St.2", 4, 15, 25.00, 30.00],
["St.3", 16, 21, 8.00, 15.00],
["St.4", 38, 8, 31.00, 40.00],
["St.5", 10, 18, 10.00, 10.00],
["St.6", 40, 4, 25.00, 30.00],
["St.7", 10, 13, 8.00, 15.00],
["St.8", 46, 11, 31.00, 40.00],
["St.9", 28, 9, 10.00, 10.00],
["St.10", 14, 22, 25.00, 30.00],
["St.11", 5, 40, 8.00, 15.00],
["St.12", 11, 10, 31.00, 40.00],
["St.89", 61, 35, 10.00, 10.00],
["St.90", 23, 29, 25.00, 30.00],
["St.91", 35, 12, 8.00, 15.00],
["St.92", 31, 7, 31.00, 40.00],
]
column = "Air_Station_Code Humidity Temperature Latitude Longitude".split()
df = pd.DataFrame(data,columns=column)
print(df)
df1 = df.drop_duplicates(['Latitude','Longitude'])
df1 = df1[['Air_Station_Code','Latitude','Longitude']]
df1.set_index(['Latitude','Longitude'], inplace=True)
print(df1)
df2 = df.join( df1, on=['Latitude','Longitude'], rsuffix='R' )
print(df2)
df['Air_Station_Code'] = df2['Air_Station_CodeR']
print(df)
Output:
Air_Station_Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.5 10 18 10.0 10.0
5 St.6 40 4 25.0 30.0
6 St.7 10 13 8.0 15.0
7 St.8 46 11 31.0 40.0
8 St.9 28 9 10.0 10.0
9 St.10 14 22 25.0 30.0
10 St.11 5 40 8.0 15.0
11 St.12 11 10 31.0 40.0
12 St.89 61 35 10.0 10.0
13 St.90 23 29 25.0 30.0
14 St.91 35 12 8.0 15.0
15 St.92 31 7 31.0 40.0
Air_Station_Code
Latitude Longitude
10.0 10.0 St.1
25.0 30.0 St.2
8.0 15.0 St.3
31.0 40.0 St.4
Air_Station_Code Humidity ... Longitude Air_Station_CodeR
0 St.1 20 ... 10.0 St.1
1 St.2 4 ... 30.0 St.2
2 St.3 16 ... 15.0 St.3
3 St.4 38 ... 40.0 St.4
4 St.5 10 ... 10.0 St.1
5 St.6 40 ... 30.0 St.2
6 St.7 10 ... 15.0 St.3
7 St.8 46 ... 40.0 St.4
8 St.9 28 ... 10.0 St.1
9 St.10 14 ... 30.0 St.2
10 St.11 5 ... 15.0 St.3
11 St.12 11 ... 40.0 St.4
12 St.89 61 ... 10.0 St.1
13 St.90 23 ... 30.0 St.2
14 St.91 35 ... 15.0 St.3
15 St.92 31 ... 40.0 St.4
[16 rows x 6 columns]
Air_Station_Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.1 10 18 10.0 10.0
5 St.2 40 4 25.0 30.0
6 St.3 10 13 8.0 15.0
7 St.4 46 11 31.0 40.0
8 St.1 28 9 10.0 10.0
9 St.2 14 22 25.0 30.0
10 St.3 5 40 8.0 15.0
11 St.4 11 10 31.0 40.0
12 St.1 61 35 10.0 10.0
13 St.2 23 29 25.0 30.0
14 St.3 35 12 8.0 15.0
15 St.4 31 7 31.0 40.0
CodePudding user response:
df['Air Station Code'] = 'St.' pd.Series(df[['Latitude','Longitude']].astype(str).agg(sum, axis=1).factorize()[0] 1).astype(str)
df
Out[77]:
Air Station Code Humidity Temperature Latitude Longitude
0 St.1 20 10 10.0 10.0
1 St.2 4 15 25.0 30.0
2 St.3 16 21 8.0 15.0
3 St.4 38 8 31.0 40.0
4 St.1 10 18 10.0 10.0
5 St.2 40 4 25.0 30.0
6 St.3 10 13 8.0 15.0
7 St.4 46 11 31.0 40.0
8 St.1 28 9 10.0 10.0
9 St.2 14 22 25.0 30.0
10 St.3 5 40 8.0 15.0
11 St.4 11 10 31.0 40.0