I have this example CSV file:
Name,Dimensions,Color
Chair,!12:88:33!!9:10:50!!40:23:11!,Red
Table,!9:10:50!!40:23:11!,Brown
Couch,!40:23:11!!12:88:33!,Blue
I read it into a dataframe, then split Dimensions
by !
and take the first value of each !..:..:..!
-section. I append these as new columns to the dataframe, and delete Dimensions
. (code for this below)
import pandas as pd
df = pd.read_csv("./data.csv")
df[["first","second","third"]] = (df['Dimensions']
.str.strip('!')
.str.split('!{1,}', expand=True)
.apply(lambda x: x.str.split(':').str[0]))
df = df.drop("Dimensions", axis=1)
And I get this:
Name Color first second third
0 Chair Red 12 9 40
1 Table Brown 9 40 None
2 Couch Blue 40 12 None
I named them ["first","second","third"]
by manually here.
But what if there are more than 3 in the future, or only 2, or I don't know how many there will be, and I want them to be named using a string an enumerating number?
Like this:
Name Color data_0 data_1 data_2
0 Chair Red 12 9 40
1 Table Brown 9 40 None
2 Couch Blue 40 12 None
Question:
How do I make the naming automatic, based on the string "data_" so it gives each column the name "data_" the number of the column? (So I don't have to type in names manually)
CodePudding user response:
Nevermind, hahah, I solved it.
import pandas as pd
df = pd.read_csv("./data.csv")
df2 = (df['Dimensions']
.str.strip('!')
.str.split('!{1,}', expand=True)
.apply(lambda x: x.str.split(':').str[0]))
df[[ ("data_" str(i)) for i in range(len(df2.columns)) ]] = df2
df = df.drop("Dimensions", axis=1)
CodePudding user response:
Use DataFrame.pop
for use and drop column Dimensions
, add DataFrame.add_prefix
to default columnsnames and append to original DataFrame by DataFrame.join
:
df = (df.join(df.pop('Dimensions')
.str.strip('!')
.str.split('!{1,}', expand=True)
.apply(lambda x: x.str.split(':').str[0]).add_prefix('data_')))
print (df)
Name Color data_0 data_1 data_2
0 Chair Red 12 9 40
1 Table Brown 9 40 None
2 Couch Blue 40 12 None