I have a data in a csv file here is a sample:
firstnb,secondnb,distance
901,19011,459.73618164837535
901,19017,492.5540450352788
901,19018,458.489289271722
903,13019,167.46632044684435
903,13020,353.16001204909657
the desired output:
901,19011,19017,19018
903,13019,13020
As you can see in the output I want to take firstnb column (901/903)
and put beside each one the secondnb I believe you can understand from the desired output better than my explanation :D
What I tried so far is the following:
import pandas as pd
import csv
df = pd.read_csv('test.csv')
with open('neighborList.csv','w',newline='') as file:
writer = csv.writer(file)
secondStation = []
for row in range(len(df)):
firstStation = df['firstnb'][row]
for x in range(len(df)):
if firstStation == df['firstnb'][x]:
secondStation.append(df['secondnb'][x])
# line = firstStation ,secondStation
# writer.writerow(line)
print(firstStation,secondStation)
secondStation = []
my code output this :
901 [19011, 19017, 19018]
901 [19011, 19017, 19018]
901 [19011, 19017, 19018]
903 [13019, 13020]
903 [13019, 13020]
CodePudding user response:
First, I grouped the data by firstnb
, creating a list
of the values in secondnb
using the aggregate
function.
df[['firstnb','secondnb']].groupby('firstnb').aggregate(func=list).to_dict()
By turning this into a dict
, we get:
{'secondnb': {901: [19011, 19017, 19018], 903: [13019, 13020]}}
I'm not entirely clear on what the final output should be (plain strings, lists, …), but from here on, it's easy to produce whatever you'd like.
For example, a list of lists:
intermediate = df[['firstnb','secondnb']].groupby('firstnb').aggregate(func=list).to_dict()
[[k] v for k,v in intermediate['secondnb'].items()]
Result:
[[901, 19011, 19017, 19018], [903, 13019, 13020]]
CodePudding user response:
def toList(a):
res = []
for r in a:
res.append(r)
return res
df.groupby('firstnb').agg(toList)
CodePudding user response:
Pandas has a built in function to do this, called groupby:
df = pd.read_csv(YOUR_CSV_FILE)
df_grouped = list(df.groupby(df['firstnb'])) # group by first column
# chain keys and values into merged list
for key, values in df_grouped:
print([key] values['secondnb'].tolist())
Here I just print the sublists; you can save them into a new csv in any format you'd like (strings, ints, etc)