Dynamic column assignment in Python Pandas-CodePudding

I have a pandas dataframe from which I'd like to create some text-related feature columns. I also have a class that calculates those features. Here's my code:

r = ReadabilityMetrics()
text_features = [['sentence_count', r.sentence_count], ['word_count', r.word_count], ['syllable_count', r.syllable_count], ['unique_words', r.unique_words],
               ['reading_time', r.reading_time], ['speaking_time', r.speaking_time], ['flesch_reading_ease', r.flesch_reading_ease], ['flesch_kincaid_grade', r.flesch_kincaid_grade], 
                 ['char_count', r.char_count]]

(df
 .assign(**{t:df['description'].apply(f) for t, f in text_features})
)

I iterate over text_features to dynamically create the columns.

My question: how can I remove reference to the methods and make text_features more concise?

For example, I'd like have text_features = ['sentence_count', 'word_count', 'syllable_count', ...], and since the column names are the same as the function names, dynamically reference the functions. Having a nested list doesn't seem DRY so looking for a more efficient implementation.

CodePudding user response：

I think you're looking for this:

text_features = ['sentence_count', 'word_count', 'syllable_count', 'unique_words', 'reading_time', 'speaking_time', 'flesch_reading_ease', 'flesch_kincaid_grade', 'char_count']

df.assign(**{func_name: df['description'].apply(getattr(r, func_name)) for func_name in text_features})

CodePudding user response：

for column_name, function in text_features:
    df[column_name] = df['description'].apply(function)

I think this is fine. I would probably define text_features as a list of tuples rather than a list of lists.

If you're sure that it has to be more concise, define text_features as a list of strings.

for column name in text_features:
    df[column_name] = df['description'].apply(getattr(r, column_name))

I would not try to make it any more concise than this (such as using ** with a dict) as to make the solution less esoteric, but this is just a matter of opinion.

CodePudding user response：

In your case try getattr

(df
 .assign(**{t:df['description'].apply(getattr(r, t)()) for t in text_features})
)