I have a pandas dataframe from which I'd like to create some text-related feature columns. I also have a class that calculates those features. Here's my code:
r = ReadabilityMetrics()
text_features = [['sentence_count', r.sentence_count], ['word_count', r.word_count], ['syllable_count', r.syllable_count], ['unique_words', r.unique_words],
['reading_time', r.reading_time], ['speaking_time', r.speaking_time], ['flesch_reading_ease', r.flesch_reading_ease], ['flesch_kincaid_grade', r.flesch_kincaid_grade],
['char_count', r.char_count]]
(df
.assign(**{t:df['description'].apply(f) for t, f in text_features})
)
I iterate over text_features
to dynamically create the columns.
My question: how can I remove reference to the methods and make text_features
more concise?
For example, I'd like have text_features = ['sentence_count', 'word_count', 'syllable_count', ...]
, and since the column names are the same as the function names, dynamically reference the functions. Having a nested list doesn't seem DRY so looking for a more efficient implementation.
CodePudding user response:
I think you're looking for this:
text_features = ['sentence_count', 'word_count', 'syllable_count', 'unique_words', 'reading_time', 'speaking_time', 'flesch_reading_ease', 'flesch_kincaid_grade', 'char_count']
df.assign(**{func_name: df['description'].apply(getattr(r, func_name)) for func_name in text_features})
CodePudding user response:
for column_name, function in text_features:
df[column_name] = df['description'].apply(function)
I think this is fine. I would probably define text_features
as a list of tuples rather than a list of lists.
If you're sure that it has to be more concise, define text_features
as a list of strings.
for column name in text_features:
df[column_name] = df['description'].apply(getattr(r, column_name))
I would not try to make it any more concise than this (such as using **
with a dict) as to make the solution less esoteric, but this is just a matter of opinion.
CodePudding user response:
In your case try getattr
(df
.assign(**{t:df['description'].apply(getattr(r, t)()) for t in text_features})
)