I have a CSV that contains multiple identical column names that I want to be split into their own rows instead while keeping the 'id' field intact. Essentially, things that should have been their own rows were placed at the end of a row instead.
I am struggling to put it into words so the example below should make things clear. You can see the 'road_name', 'length', and 'type' is repeated several times.
INPUT:
id,road_name,length,type,road_name,length,type,road_name,length,type,road_name,length,type
1,r22,22,c,r16,16,a,r17,13,a,r77,13,c
2,r16,5,c,,,,,,,,,
3,r12,7,a,r9,12,c,r3,5,c,,,
DESIRED OUTPUT:
id,road_name,length,type
1,r22,22,c
1,r16,16,a
1,r17,13,a
1,r77,13,c
2,r16,5,c
3,r12,7,a
3,r9,12,c
3,r3,5,c
It would be great to be pointed in the right direction. My inability to put this problem into words is making it hard for me to search relevant related examples.
CodePudding user response:
This work around will organize the data as you wanted. And you can save it as new csv file.
df = pd.read_csv("your_file.csv")
num_cols = df.shape[1]
df_out = pd.DataFrame(columns=["id", "road_name", "length", "type"])
for i in range(len(df)):
id_ = df.iloc[i]["id"]
for j in range(1, num_cols - 2, 3):
rn = df.iloc[i][j]
le = df.iloc[i][j 1]
ty = df.iloc[i][j 2]
if rn != "":
df_out = pd.concat([df_out, pd.DataFrame({"id": [id_], "road_name": [rn], "length": [le], "type": [ty]})], ignore_index=True)
df_out.dropna(inplace=True)
df_out.reset_index(drop=True, inplace=True)
df_out.to_csv("organized.csv", index=False)
print(df_out)
id road_name length type
0 1 r22 22 c
1 1 r16 16 a
2 1 r17 13 a
3 1 r77 13 c
4 2 r16 5 c
5 3 r12 7 a
6 3 r9 12 c
7 3 r3 5 c