I've a pandas dataframe as follows.
df =
col_1 sum value_3 value_5 value_7
0 4.0 45.0 NaN 0.9 9.3
1 4.0 NaN NaN 4.5 NaN
2 49.2 10.8 3.4 NaN NaN
I want to iterate through every row, and from each row, I want the following information to be put in a dictionary. Let sum_dict
be the dictionary.
For every row, by default, I want the values from col_1
to be the key and value present in the column sum
to be the value of dictionary i.e., from first row, the dictionary looks like below
sum_dict = {4.0 : 45.0}
In addition to the above if there is a value present in columns value_3
, value_5
, value_7
, I want the column number to be extracted from the column name which will be added as key to the dictionary and the value present in the column will the value to the key in dictionary. Again, for the first row, sum_dict
looks like follows:
sum_dict = {4.0 : 45.0, 5.0 : 0.9 , 7.0: 9.3}
From the second row, the sum_dict
should look like:
sum_dict = {4.0: NaN, 5.0 : 4.5}
Is there a way to do this?
For now I have written the follows:
for idx,row in df.iterows():
sum_dict = {}
sum_dict[row['col_1']] = row['sum']
The above code only the values from col_1
and sum
to the dictionary. Is there a way to do the rest as stated above?
CodePudding user response:
Try:
out = []
for _, x in df.iterrows():
out.append(
{
x["col_1"]: x["sum"],
**{
k: v
for k, v in zip(
(3.0, 5.0, 7.0), x[["value_3", "value_5", "value_7"]]
)
if pd.notna(v)
},
}
)
print(out)
Prints:
[{4.0: 45.0, 5.0: 0.9, 7.0: 9.3},
{4.0: nan, 5.0: 4.5},
{49.2: 10.8, 3.0: 3.4}]
CodePudding user response:
Using a similar approach as @Andrej Kesely suggested, you can use float("".join(char for char in str(key) if char.isnumeric()))
as the dictionary keys, if you don't want to manually type the column indexes:
import pandas as pd
# == Example DataFrame ========================================
df = pd.DataFrame(
{
'col_1': [4.0, 4.0, 49.2],
'sum': [45.0, None , 10.8],
'value_3': [None, None, 3.4],
'value_5': [0.9, 4.5, None],
'value_7': [9.3, None, None],
}
)
# == Solution =================================================
sum_dicts = []
for idx, row in df.iterrows():
sum_dict = {
row["col_1"]: row["sum"],
**{
float("".join(char for char in str(key) if char.isnumeric())): value
for key, value in row.items()
if key not in ["col_1", "sum"] and pd.notna(value)
},
}
sum_dicts.append(sum_dict)
print(sum_dicts)
# Prints:
#
# [{4.0: 45.0, 5.0: 0.9, 7.0: 9.3}, {4.0: nan, 5.0: 4.5}, {49.2: 10.8, 3.0: 3.4}]