Home > Back-end >  Pandas: Give (string numbered) name to unknown number of added columns
Pandas: Give (string numbered) name to unknown number of added columns

Time:07-19

I have this example CSV file:

Name,Dimensions,Color
Chair,!12:88:33!!9:10:50!!40:23:11!,Red
Table,!9:10:50!!40:23:11!,Brown
Couch,!40:23:11!!12:88:33!,Blue

I read it into a dataframe, then split Dimensions by ! and take the first value of each !..:..:..!-section. I append these as new columns to the dataframe, and delete Dimensions. (code for this below)

import pandas as pd

df = pd.read_csv("./data.csv")

df[["first","second","third"]] = (df['Dimensions']
               .str.strip('!')
               .str.split('!{1,}', expand=True)
               .apply(lambda x: x.str.split(':').str[0]))
               
df = df.drop("Dimensions", axis=1)

And I get this:

    Name  Color first second third
0  Chair    Red    12      9    40
1  Table  Brown     9     40  None
2  Couch   Blue    40     12  None

I named them ["first","second","third"] by manually here.
But what if there are more than 3 in the future, or only 2, or I don't know how many there will be, and I want them to be named using a string an enumerating number?
Like this:

    Name  Color data_0 data_1 data_2
0  Chair    Red     12      9     40
1  Table  Brown      9     40   None
2  Couch   Blue     40     12   None

Question:
How do I make the naming automatic, based on the string "data_" so it gives each column the name "data_" the number of the column? (So I don't have to type in names manually)

CodePudding user response:

Nevermind, hahah, I solved it.

import pandas as pd

df = pd.read_csv("./data.csv")

df2 = (df['Dimensions']
               .str.strip('!')
               .str.split('!{1,}', expand=True)
               .apply(lambda x: x.str.split(':').str[0]))

df[[ ("data_" str(i)) for i in range(len(df2.columns)) ]]  = df2
df = df.drop("Dimensions", axis=1)

CodePudding user response:

Use DataFrame.pop for use and drop column Dimensions, add DataFrame.add_prefix to default columnsnames and append to original DataFrame by DataFrame.join:

df = (df.join(df.pop('Dimensions')
               .str.strip('!')
               .str.split('!{1,}', expand=True)
               .apply(lambda x: x.str.split(':').str[0]).add_prefix('data_')))
print (df)
    Name  Color data_0 data_1 data_2
0  Chair    Red     12      9     40
1  Table  Brown      9     40   None
2  Couch   Blue     40     12   None
  • Related