Home > database >  Change values in pandas dataframe depending on value to subset of value
Change values in pandas dataframe depending on value to subset of value

Time:10-13

I have a dataframe where some of the values are empty lists, and others are lists of dicts. like this:

0   [{'text': 'Improvement in steam-engine side-va...   []  []  [{'text': '@einen tetes strut ffice. IMPROV...
1   [{'text': 'Gate.', 'language': 'en', 'truncate...   []  []  [{'text': 'No. 645,359. Patented Mar. 13, I900...
2   [{'text': 'Overseaming sewing-machine.', 'lang...   []  []  [{'text': 'No. 64 5,8l5. Patented Mar. 20, I90...

I want to change the values where they are lists of dicts to be just one value from the first dict of the list. I would have liked to do something like this:

df.loc[df!=[]] = df[0]['text']

Which obviously doesn't work.

CodePudding user response:

So, given this toy dataframe:

import pandas as pd

df = pd.DataFrame(
    [
        [
            [{"text": "Improvement ..."}],
            [],
            [],
            [{"text": "@einen tete..."}],
        ],
        [
            [{"text": "Overseaming..."}],
            [],
            [],
            [{"text": "No. 64 5,8l5..."}],
        ],
    ]
)
print(df)
# Outputs
                               0   1   2                              3
0  [{'text': 'Improvement ...'}]  []  []   [{'text': '@einen tete...'}]
1   [{'text': 'Overseaming...'}]  []  []  [{'text': 'No. 64 5,8l5...'}]

You could do this:

df = df.applymap(lambda x: x[0]["text"] if x != [] else x)

print(df)
# Ouputs
                 0   1   2                3
0  Improvement ...  []  []   @einen tete...
1   Overseaming...  []  []  No. 64 5,8l5...

Alternatively, you could iterate and update values like this:

for col in df.columns:
    for i in df.index:
        try:
            df.loc[i, col] = df.loc[i, col][0]["text"]
        except IndexError:
            continue

print(df)
# Ouputs
                 0   1   2                3
0  Improvement ...  []  []   @einen tete...
1   Overseaming...  []  []  No. 64 5,8l5...

CodePudding user response:

improving Laurent's great answer, solving the problem in one line using dataframe functionality:

df.applymap(lambda x:x[0]["text"] if x!=[])
  • Related