Extracting values from pandas dataframe-CodePudding

I've a pandas dataframe as follows.

df = 

     col_1   sum   value_3    value_5   value_7
0    4.0    45.0    NaN         0.9       9.3
1    4.0    NaN     NaN         4.5       NaN
2    49.2   10.8    3.4         NaN       NaN

I want to iterate through every row, and from each row, I want the following information to be put in a dictionary. Let sum_dict be the dictionary. For every row, by default, I want the values from col_1 to be the key and value present in the column sum to be the value of dictionary i.e., from first row, the dictionary looks like below

sum_dict = {4.0 : 45.0}

In addition to the above if there is a value present in columns value_3, value_5, value_7, I want the column number to be extracted from the column name which will be added as key to the dictionary and the value present in the column will the value to the key in dictionary. Again, for the first row, sum_dictlooks like follows:

sum_dict = {4.0 : 45.0, 5.0 : 0.9 , 7.0: 9.3}

From the second row, the sum_dict should look like:

sum_dict = {4.0: NaN, 5.0 : 4.5}

Is there a way to do this?

For now I have written the follows:

for idx,row in df.iterows():
    sum_dict = {}
    sum_dict[row['col_1']] = row['sum']

The above code only the values from col_1 and sum to the dictionary. Is there a way to do the rest as stated above?

CodePudding user response：

Try:

out = []
for _, x in df.iterrows():
    out.append(
        {
            x["col_1"]: x["sum"],
            **{
                k: v
                for k, v in zip(
                    (3.0, 5.0, 7.0), x[["value_3", "value_5", "value_7"]]
                )
                if pd.notna(v)
            },
        }
    )

print(out)

Prints:

[{4.0: 45.0, 5.0: 0.9, 7.0: 9.3}, 
 {4.0: nan, 5.0: 4.5}, 
 {49.2: 10.8, 3.0: 3.4}]

CodePudding user response：

Using a similar approach as @Andrej Kesely suggested, you can use float("".join(char for char in str(key) if char.isnumeric())) as the dictionary keys, if you don't want to manually type the column indexes:


import pandas as pd

# == Example DataFrame ========================================
df = pd.DataFrame(
    {
        'col_1': [4.0, 4.0, 49.2],
        'sum': [45.0, None , 10.8],
        'value_3': [None, None, 3.4],
        'value_5': [0.9, 4.5, None],
        'value_7': [9.3, None, None],
    }
)


# == Solution =================================================
sum_dicts = []
for idx, row in df.iterrows():
    sum_dict = {
        row["col_1"]: row["sum"],
        **{
            float("".join(char for char in str(key) if char.isnumeric())): value
            for key, value in row.items()
            if key not in ["col_1", "sum"] and pd.notna(value)
        },
    }
    sum_dicts.append(sum_dict)

print(sum_dicts)
# Prints:
#
# [{4.0: 45.0, 5.0: 0.9, 7.0: 9.3}, {4.0: nan, 5.0: 4.5}, {49.2: 10.8, 3.0: 3.4}]