Home > Net >  Pandas merge not working as expected in streamlit
Pandas merge not working as expected in streamlit

Time:01-01

Summary

When using pandas merge function within a callback function, the dataframe is not updated correctly. However, the pandas drop function works as expected

Note that although i have turned on st.cache. The same behavior is noted when removing the cache function as well.

Steps to reproduce

Code snippet:

import streamlit as st
import pandas as pd


@st.cache(allow_output_mutation=True)
def read_df():
    df = pd.DataFrame({
        'col1':[1,2],
        'col2':['A','B']
    })
    return df

df = read_df()

def do_something():
    global df
    df_new = pd.DataFrame({
        'col1':[1,2],
        'col3':["X","Y"]
    })
    df.drop(['col2'], axis = 1, inplace = True)
    df = df.merge(df_new, on="col1")

st.button("Do Something", on_click=do_something, args =())

download_csv = df.to_csv().encode('utf-8')
st.download_button('Download', data = download_csv, file_name = 'download_csv.csv', mime='text/csv')

Steps to reproduce behavior

  • click on "Do Something" button
  • click on "Download" button

Expected behavior:

I would expect the downloaded csv to be displayed

   col1 col3
0     1    X
1     2    Y

Actual behavior:

However, i get the following output instead

   col1 
0     1    
1     2   

Debug info

  • Streamlit version: 1.16.0
  • Python version: 3.8.15
  • Using Conda: Yes
  • OS version: Windows 11
  • Browser version: Edge v108.0.1462.54

CodePudding user response:

The way that I would do it would be to store and retrieve the dataframe from a session_state variable. This way you know that you are getting and working with the most up-to-date values.

  • st.session_state['df'] = df - will set the 'df' session state variable as the current df
  • st.session_state['df'] = df1 - will update the session state variable with the merged df

Here is an example:

import streamlit as st
import pandas as pd

@st.experimental_memo
def read_df():
    df = pd.DataFrame({
        'col1':[1,2],
        'col2':['A','B']
    })
    st.session_state['df'] = df
    return df

df = read_df()

def do_something():
    df1 = st.session_state['df']
    df_new = pd.DataFrame({
        'col1':[1,2],
        'col3':["X","Y"]
    })
    df1.drop(['col2'], axis = 1, inplace = True)
    df1 = df1.merge(df_new, on="col1")
    st.session_state['df'] = df1

st.button("Do Something", on_click=do_something, args =())
df = st.session_state['df']
download_csv = df.to_csv().encode('utf-8')
st.download_button('Download', data = download_csv, file_name = 'download_csv.csv', mime='text/csv')

Output file:

file_name = 'download_csv.csv'

   col1 col3
0     1    X
1     2    Y

Note:

@st.experimental_memo - ensures that the df is only loaded once.

  • Related