I have the following dictionary:
test = {'AAGUFU 60 (MDE).jpg': 0.2825904813711154,
'AAGUFU 60 (MCE).jpg': 0.27073007232248,
'AAGUFU 60 (MCA).jpg': 0.3736480594737323,
'AAGUFU 60 (MCP).jpg': 0.45155877307246917}
and the following initialized dataframe:
df = pd.DataFrame(columns = ["specimen", "MDE", "MCE", "MCA", "MCP"])
specimen MDE MCE MCA MCP
I wrote the following code that does: 1) extract the filename (e.g. AAGUFU 60) and after that extract the abbreviation between parenthesis (e.g. MDE); 2) then I want to store the catalog name (AAGUFU 60) to the specimen column of the dataframe, and after that, store the value of each value of the dictionary corresponding to each abbreviation of the dataframe columns in the same row of the filename
I wrote the following code but it ins't working. I read somewhere saying not to add values iteratively to rows of a dataframe because it is computationaly expensive. Any alternative to that? Maybe creating a list of dictionaries to apply from_dict() to it? Also, I think the nested fors in my code aren't efficient and would like some hints to improve its efficiency
for specimenCatalog in test:
filename = specimenCatalog
#get filename
specimen = re.search('. (?= \()', filename)
specimen.group(0)
#get abbreviation btw parenthesis
muscle = filename[filename.find('(') 1:filename.find(')')]
muscle
for measurement in test.values():
for index, value in df.iterrows():
if pd.isna(df.specimen[index]) == True:
df.specimen[index] = specimen.group(0)
else:
continue
df.at[index, muscle] = measurement
So my expected output would be a dataframe as follow, and I will need to add more rows of the dataframe with other similar dictionaries:
specimen MDE MCE MCA MCP
AAGUFU 60 0.282 0.270 0.373 0.451
CodePudding user response:
What you could do is:
test = {'AAGUFU 60 (MDE).jpg': 0.2825904813711154,
'AAGUFU 60 (MCE).jpg': 0.27073007232248,
'AAGUFU 60 (MCA).jpg': 0.3736480594737323,
'AAGUFU 60 (MCP).jpg': 0.45155877307246917}
test = [(key, value) for key, value in test.items()]
# turn dictionary into DataFrame
df = pd.DataFrame(
columns=['specimen', 'value'],
data=test
)
# define and use functions for transforming strings
def get_type(s):
return s[s.find('(') 1:s.find(')')]
def get_specimen(s):
return s[:s.find('(')]
df['type'] = df['specimen'].apply(get_type)
df['specimen'] = df['specimen'].apply(get_specimen)
# turn MDE, MCE etc. into columns
df.pivot(index='specimen', columns='type', values='value')