being new to python I am looking for some help reshaping this data, already know how to do so in excel but want a python specific solution.
I want it to be in this format.
entire dataset is 70k rows with different vc_firm_names, any help would be great.
CodePudding user response:
If you care about performance, then I suggest you take a look at other methods (such as using numpy, or sorting the table):
- https://stackoverflow.com/a/42550516/17323241
- https://stackoverflow.com/a/66018377/17323241
- https://stackoverflow.com/a/22221675/17323241 (look at second comment)
Otherwise, you can do:
# load data from csv file
df = pd.read_csv("example.csv")
# aggregate
df.groupby("vc_first_name")["investment_industry"].apply(list)
CodePudding user response:
Assuming the original file is "original.csv"
, and you want to save it as "new.csv"
I would do:
pd.read_csv("original.csv").groupby(by=["vc_firm_name"],as_index=False).aggregate(lambda x: ','.join(x)).to_csv("new.csv", index=False)