Translate a Binary Float Using Lists/Dictionaries-CodePudding

I have the following list:

var_list = ["Apple", "Banana", "Orange"]

I also have a dataframe that looks like the following:

      id      var
0     H1      010
1     H2      110
2     H3      111
...
4443  H4443   101

I would like my new data to look like:

      id      var
0     H1      Banana
1     H2      Apple;Banana
2     H3      Apple;Banana;Orange
...
4443  H4443   Apple;Orange

Note my dataframe has about 4000 rows and the only solution I can think of so far involves iterating through a loop 4000 times which I would like to avoid, if possible :-/

Similar concept of getting the same original dataframe to look like this

      id      apple    banana   orange
0     H1      No       Yes      No
1     H2      Yes      Yes      No
2     H3      Yes      Yes      Yes
...
4443  H4443   Yes      No       Yes

EDIT: var is a FLOAT not a binary string.

CodePudding user response：

import pandas as pd

var_list = ["Apple", "Banana", "Orange"]
df = pd.DataFrame([
    ['H1', '010'],
    ['H2', '110'],
    ['H3' , '111']], columns=['id', 'var'])

idx = df['var'].str.split('', expand=True).iloc[:, 1:-1]
idx = idx.T.reset_index(drop=True).T.astype('int32').astype(bool)

var_list = pd.Series(var_list)
df2 = pd.DataFrame({
    'id': df['id'],
    'var': idx.apply(lambda i: ';'.join(var_list[i]))
})
print(df2)

df3 = pd.concat([
    df[['id']], idx.replace({True: 'Yes', False: 'No'})
], axis=1)
df3.columns = ['id', *var_list]
print(df3)

prints

index	id	var
0	H1	Banana;Orange
1	H2	Apple;Banana;Orange
2	H3	Orange

index	id	Apple	Banana	Orange
0	H1	No	Yes	No
1	H2	Yes	Yes	No
2	H3	Yes	Yes	Yes

CodePudding user response：

Write a short function that checks if each corresponding item has a "0" or "1" in place and use it in an apply statement:

def subst(row):
    words = []
    for index, contains in enumerate(row["var"]):
         if contains == "1":
             words.append(var_list[index])
    return ";".join(words)

resulting_col = df.apply(subst)

Or, since the for with an enumerate and the filter are the natural form for a list comprehension (or, in this case, a generator expression), this can even be written as a one-line expression using a lambda


resulting_col = df.apply(
    lambda row: ";".join(var_list[index] 
        for index, contains  in enumerate(row["var"]) if contains=="1"))

CodePudding user response：

You can first define a mapping function and use pd.Series.map(func) method to apply the function in your column at once (doc here). I assume that this method from pandas is quite optimized.


def binary_to_fruit(binary_str: str) -> str:
    flags = [bit == '1' for bit in list(binary_str)]  # booleans
    out = ''
    for flag, fruit_str in zip(flags, var_list):
        out = out   fruit_str   ';' if flag else out
    out = out[:-1]  # removing last ';' if out is non-empty
    return out

my_df['var'] = my_df['var'].map(binary_to_fruit)

Note that pd.Series.apply(func) can also be used for basically the same thing.