I have the following list:
var_list = ["Apple", "Banana", "Orange"]
I also have a dataframe that looks like the following:
id var
0 H1 010
1 H2 110
2 H3 111
...
4443 H4443 101
I would like my new data to look like:
id var
0 H1 Banana
1 H2 Apple;Banana
2 H3 Apple;Banana;Orange
...
4443 H4443 Apple;Orange
Note my dataframe has about 4000 rows and the only solution I can think of so far involves iterating through a loop 4000 times which I would like to avoid, if possible :-/
Similar concept of getting the same original dataframe to look like this
id apple banana orange
0 H1 No Yes No
1 H2 Yes Yes No
2 H3 Yes Yes Yes
...
4443 H4443 Yes No Yes
EDIT: var
is a FLOAT not a binary string.
CodePudding user response:
import pandas as pd
var_list = ["Apple", "Banana", "Orange"]
df = pd.DataFrame([
['H1', '010'],
['H2', '110'],
['H3' , '111']], columns=['id', 'var'])
idx = df['var'].str.split('', expand=True).iloc[:, 1:-1]
idx = idx.T.reset_index(drop=True).T.astype('int32').astype(bool)
var_list = pd.Series(var_list)
df2 = pd.DataFrame({
'id': df['id'],
'var': idx.apply(lambda i: ';'.join(var_list[i]))
})
print(df2)
df3 = pd.concat([
df[['id']], idx.replace({True: 'Yes', False: 'No'})
], axis=1)
df3.columns = ['id', *var_list]
print(df3)
prints
index | id | var |
---|---|---|
0 | H1 | Banana;Orange |
1 | H2 | Apple;Banana;Orange |
2 | H3 | Orange |
index | id | Apple | Banana | Orange |
---|---|---|---|---|
0 | H1 | No | Yes | No |
1 | H2 | Yes | Yes | No |
2 | H3 | Yes | Yes | Yes |
CodePudding user response:
Write a short function that checks if each corresponding item has a "0" or "1" in place and use it in an apply statement:
def subst(row):
words = []
for index, contains in enumerate(row["var"]):
if contains == "1":
words.append(var_list[index])
return ";".join(words)
resulting_col = df.apply(subst)
Or, since the for
with an enumerate and the filter are the natural
form for a list comprehension (or, in this case, a generator expression), this can even be written as a one-line
expression using a lambda
resulting_col = df.apply(
lambda row: ";".join(var_list[index]
for index, contains in enumerate(row["var"]) if contains=="1"))
CodePudding user response:
You can first define a mapping function and use pd.Series.map(func)
method to apply the function in your column at once (doc here). I assume that this method from pandas is quite optimized.
def binary_to_fruit(binary_str: str) -> str:
flags = [bit == '1' for bit in list(binary_str)] # booleans
out = ''
for flag, fruit_str in zip(flags, var_list):
out = out fruit_str ';' if flag else out
out = out[:-1] # removing last ';' if out is non-empty
return out
my_df['var'] = my_df['var'].map(binary_to_fruit)
Note that pd.Series.apply(func)
can also be used for basically the same thing.