Home > OS >  Split duplicate columns names in CSV into new row keeping first column intact
Split duplicate columns names in CSV into new row keeping first column intact

Time:01-17

I have a CSV that contains multiple identical column names that I want to be split into their own rows instead while keeping the 'id' field intact. Essentially, things that should have been their own rows were placed at the end of a row instead.

I am struggling to put it into words so the example below should make things clear. You can see the 'road_name', 'length', and 'type' is repeated several times.

INPUT:

id,road_name,length,type,road_name,length,type,road_name,length,type,road_name,length,type
1,r22,22,c,r16,16,a,r17,13,a,r77,13,c
2,r16,5,c,,,,,,,,,
3,r12,7,a,r9,12,c,r3,5,c,,,

DESIRED OUTPUT:

id,road_name,length,type
1,r22,22,c
1,r16,16,a
1,r17,13,a
1,r77,13,c
2,r16,5,c
3,r12,7,a
3,r9,12,c
3,r3,5,c

It would be great to be pointed in the right direction. My inability to put this problem into words is making it hard for me to search relevant related examples.

CodePudding user response:

This work around will organize the data as you wanted. And you can save it as new csv file.

df = pd.read_csv("your_file.csv")

num_cols = df.shape[1]
df_out = pd.DataFrame(columns=["id", "road_name", "length", "type"])

for i in range(len(df)):
    id_ = df.iloc[i]["id"]
    for j in range(1, num_cols - 2, 3):
        rn = df.iloc[i][j]
        le = df.iloc[i][j 1]
        ty = df.iloc[i][j 2]
        if rn != "":
            df_out = pd.concat([df_out, pd.DataFrame({"id": [id_], "road_name": [rn], "length": [le], "type": [ty]})], ignore_index=True)

df_out.dropna(inplace=True)
df_out.reset_index(drop=True, inplace=True)

df_out.to_csv("organized.csv", index=False)
print(df_out)

  id road_name length type
0  1       r22     22    c
1  1       r16     16    a
2  1       r17     13    a
3  1       r77     13    c
4  2       r16      5    c
5  3       r12      7    a
6  3        r9     12    c
7  3        r3      5    c
  • Related