I have a dataframe that looks something like this
Detail
--------------
0 Animal: Lion,
Weight: 600 kg,
Class: Mammal
1 Animal: Monkey,
Weight: 10 kg,
2 Animal: Snake,
Class: Reptile
3 Animal: Frog,
Class: Amphibian,
Weight: 1 kg
4 Animal: Hawk,
Class: Bird
I need to read each row to see if a particular string is present in each line, then strip the line and add it to a list. If the value is not present, I need it to be appended with a null value. Then, I want to export this list as a csv file.
My desired output here is
Animal
--------------
0 Animal: Lion,
1 Animal: Monkey,
2 Animal: Snake,
3 Animal: Frog,
4 Animal: Hawk,
Weight
--------------
0 Weight: 600 kg,
1 Weight: 10 kg,
2 Weight: NaN
3 Weight: 1 kg,
4 Weight: NaN
Class
--------------
0 Class: Mammal,
1 Class: NaN
2 Class: Reptile
3 Class: Amphibian
4 Class: NaN
I need these 3 dataframes to be exported as a csv file.
I have written something like this, but don't think it is the right approach
cols= ['Animal','Weight','Class']
for entry in cols:
values=[]
for content in df['Detail']:
if(df['Detail'].str.contains(entry)):
var=content.str.split('\n')
else:
var= np.nan
values.append(var)
values_df = pd.DataFrame(values)
values_df.to_csv('%s.csv' %entry,header=entry,index=False)
CodePudding user response:
Please check this. Splitting a pandas dataframe column by delimiter I haven't run it, but it should be something like:
df[['Animal','Weight', 'Class']] = df['Detail'].str.split(',',expand=True)
If you want then to convert the column "Animal" to a list. you can do it like so:
arr = df["Animal"].to_numpy()
CodePudding user response:
In a way, your trial is correct, the only part you have missed is filtering the correct column values(as Animal, Weight, Class will not be in order)
#splitted at comma and removed extra space if any in all values
df['Detail'] = [[y.strip() for y in x.split(',')] for x in df['Detail'].values]
vals = []
for idx, row in df.iterrows():
temp = {'Animal':np.nan,'Weight':np.nan,'Class':np.nan}
for x in row['details']:
for key in temp.keys():
if x.startswith(key):
temp[key]= x
vals.append(temp)
final_df = pd.DataFrame(vals)
#print('final_df==', final_df)
Now you final_df will have 3 columns, and you can easily create separate dfs out of it by
final_df[['columnName']].to_csv('filaname.csv', index=False)
Let me know if it helps or in case you need more clarity on code.