I have a dataframe that looks like this:
a=['a','b','c','d']
b=['the','fox','the','then']
c=['quick','jumps','lazy','barks']
d=['brown','over','dog','loudly']
df=pd.DataFrame(zip(a,b,c,d),columns=['indexcol','col1','col2','col3'])
and a dictionary that looks like this:
keys=['a','b','c','d']
vals=[]
vals.append(['col1','col3'])
vals.append(['col1','col2'])
vals.append(['col1','col2','col3'])
vals.append(['col2','col3'])
newdict = {k: v for k, v in zip(keys, vals)}
What I'm trying to do is to create a new column in df which constructs a statement for each row. Taking the first row as an example, the sentence should look like so:
"col1 is 'the' | col3 is 'lazy' "
another example using the 3rd row just to make the task at hand crystal clear: "col1 is 'brown' | col2 is 'the' | col3 is 'then' "
essentially, I want to refer to the dictionary values to look up the column in df using the dictionary keys as the row reference matching to indexcol in df.
Thanks in advance.
CodePudding user response:
I'm not sure if I understand you correctly but you can try:
df = df.set_index("indexcol")
for k, v in newdict.items():
row = df.loc[k]
df.loc[k, "new_column"] = " | ".join(f"{i} is '{row[i]}'" for i in v)
print(df.reset_index())
Prints:
indexcol col1 col2 col3 new_column
0 a the quick brown col1 is 'the' | col3 is 'brown'
1 b fox jumps over col1 is 'fox' | col2 is 'jumps'
2 c the lazy dog col1 is 'the' | col2 is 'lazy' | col3 is 'dog'
3 d then barks loudly col2 is 'barks' | col3 is 'loudly'
CodePudding user response:
I guess this is what you're looking for
def func(df_row):
return ' | '.join(
f'"{col}" is "{df_row[col]}"'
for col in newdict[df_row['indexcol']]
)
df['new col'] = df.apply(func, axis=1)
indexcol | col1 | col2 | col3 | new col | |
---|---|---|---|---|---|
a | the | quick | brown | "col1" is "the" | |
b | fox | jumps | over | "col1" is "fox" | |
c | the | lazy | dog | "col1" is "the" | "col3" is "dog" |
d | then | barks | loudly | "col2" is "barks" |