Why pandas dataframe doesn't change when i used it as a input of a function with multiprocessin-CodePudding

I have a code like this:

df1 = pd.DataFrame(
    {
        "A": ["A0", "A1", "A2", "A3"],
        "B": ["B0", "B1", "B2", "B3"],
        "C": ["C0", "C1", "C2", "C3"],
        "D": ["D0", "D1", "D2", "D3"],
    }
)
df2 = pd.DataFrame(
    {
        "A": ["A4", "A5", "A6", "A7"],
        "B": ["B4", "B5", "B6", "B7"],
        "C": ["C4", "C5", "C6", "C7"],
        "D": ["D4", "D5", "D6", "D7"],
    }
)

def changeDF(df):
    df['Signal'] = 0

changeDF(df1)
changeDF(df2)

when I run above, (changeDf) function add a column to df1 and df2 named 'Signal' with 0 values. but instead of run (changeDf) directly using multiprocessing like below it doesn't change any dfs.

s = [df1, df2]
with multiprocessing.Pool(processes=2) as pool:
    res = pool.map(changeDF, s)

What's wrong with my code?

CodePudding user response：

Serializing df1 & df2 for multiprocessing means that you're making a copy.

Return your dataframe from the function and it'll work fine.

def changeDF(df):
    df['Signal'] = 0
    return(df)

with multiprocessing.Pool(processes=2) as pool:
    df1, df2 = pool.map(changeDF, [df1, df2])

I would warn you that the serialization costs of this will certainly be higher than the benefit you get from multiprocessing.

CodePudding user response：

Change your function changeDF to be like this:

def changeDF(df):
    df['Signal'] = 0
    return df