How to create new column based on key dictionary inside list-CodePudding

I have a data frame with list of dictionary (with unequal length) and i want to create a new column based on key dictionary and dictionary value as a column value

criteria 0 [{'Seniority level': 'Entry level'}, {'Employm... 1 [{'Employment type': 'Full-time'}] 2 [{'Seniority level': 'Associate'}, {'Employmen... 3 [{'Employment type': 'Part-time'}] 4 [{'Seniority level': 'Mid-Senior level'}, {'Em...

... ... 2768 [{'Seniority level': 'Entry level'}, {'Employm... 2769 [{'Seniority level': 'Entry level'}, {'Employm... 2770 [{'Seniority level': 'Entry level'}, {'Employm... 2771 [{'Seniority level': 'Mid-Senior level'}, {'Em... 2772 [{'Seniority level': 'Entry level'}, {'Employm...

I want to create the new column like this

CodePudding user response：

I have a function that does something along those lines:

import pandas as pd 

def reformat_json_column(dataframe: pd.DataFrame, column_name: str) -> pd.DataFrame:
    """
    Split a list of JSON data with one line per element of the JSON
    Each key of the JSON data is then used to construct a column and store the
    related data
    """
    data = dataframe.explode(column_name).reset_index(drop=True)
    data = pd.concat(
        [
            data.drop(column_name, axis=1),
            pd.json_normalize(data[column_name]),  # type: ignore
        ],
        axis=1,
    )
    return data

Here is a working example:

test_df = pd.DataFrame(
    {
        "a": [1, 2, 3],
        "b": [
            [{"c": 4, "d": 5}],
            [{"c": 6, "d": 7}],
            [{"c": 8, "d": 9}, {"c": 10, "d": 11}],
        ],
    }
)

assert_df = pd.DataFrame(
    {"a": [1, 2, 3, 3], "c": [4, 6, 8, 10], "d": [5, 7, 9, 11]}
)
pd.testing.assert_frame_equal(reformat_json_column(test_df, "b"), assert_df)

CodePudding user response：

To create a new column in a pandas DataFrame based on a dictionary, you can use the DataFrame.apply() method. This method allows you to apply a function to each row or column of the DataFrame and add the result as a new column.

Here is an example of how you could create a new column in a DataFrame based on a list of dictionaries with unequal length:

import pandas as pd

# Create a DataFrame with a list of dictionaries
df = pd.DataFrame([{'col1': 1, 'col2': 2, 'col3': 3},
                   {'col1': 4, 'col3': 5},
                   {'col2': 6}])

# Define a function that extracts the value of the 'col3' key from a dictionary
def get_col3_value(row):
  if 'col3' in row:
    return row['col3']
  else:
    return None

# Apply the function to each row of the DataFrame and add the result as a new column
df['col4'] = df.apply(get_col3_value, axis=1)

# Print the resulting DataFrame
print(df)
# Output:
#    col1  col2  col3  col4
# 0     1   2.0   3.0   3.0
# 1     4   NaN   5.0   5.0
# 2   NaN   6.0   NaN   NaN

In this code, the DataFrame.apply() method is used to apply the get_col3_value() function to each row of the DataFrame. The function extracts the value of the col3 key from the dictionary, and returns None if the key is not present. The result of the function is added as a new column col4 in the DataFrame.

You can modify this approach to use a different key and function to create the new column in the DataFrame. Just make sure to adjust the function accordingly to extract the correct value from the dictionary.