I started with an ugly set of txt data of several thousand rows and have cleaned it up to resemble this:
import pandas as pd
import numpy as np
data = {'Town':['Chicago', 1.11, 1.45, 0.86, 2.68, np.nan, 'Philly', 1.35, 1.55, 3.7], 'lat':[41.878, np.nan, np.nan, np.nan, np.nan, np.nan, 39.9526, np.nan, np.nan, np.nan], 'long':[-87.6298, np.nan, np.nan, np.nan, np.nan, np.nan, -75.165, np.nan, np.nan, np.nan]}
df = pd.DataFrame(data)
I would like to get it in the form below but am having some trouble.
data_wanted = {'41.8780':[1.11, 1.45, 0.86, 2.68], '39.9526':[1.35, 1.55, 3.7, np.nan]}
df_wanted = pd.DataFrame(data_wanted)
Something to note is that each town has a different number of values (one might be 100 and the next 13).
Any help would be much appreciated.
CodePudding user response:
Try with
df['new_lat'] = df['lat'].ffill()
out = df.query('lat!=new_lat').assign(key = lambda x : x.groupby('new_lat').cumcount()).pivot('key','new_lat','Town')
Out[377]:
new_lat 39.9526 41.8780
key
0 1.35 1.11
1 1.55 1.45
2 3.7 0.86
3 NaN 2.68
4 NaN NaN