I have the excel file.
'Born Date Student Name Duplicate 2001-01-01 Tommy 2001-03-04 Annie 2001-05-13 Billy 2001-07-31 Bob 2001-03-04 Annie 2001-04-04 Janet 2001-11-05 Betty 2001-12-15 Lucy 2001-08-25 Nicky 2001-12-15 Lucy 2001-07-31 John
I want to use Python update the excel file like as below result
Born Date Student Name Duplicate 2001-01-01 Tommy 2001-03-04 Annie Yes 2001-05-13 Billy 2001-07-31 Bob 2001-03-04 Annie Yes 2001-04-04 Janet 2001-11-05 Betty 2001-12-15 Lucy Yes 2001-08-25 Nicky 2001-12-15 Lucy Yes 2001-07-31 John
Below is my code... I'm a beginner in Python please teach me how to coding, Thanks!
import pandas as pd df = pd.read_excel('C:/Users/Desktop/Studen List.xlsx') dc = df.duplicated() print(dc)
CodePudding user response:
This may help you to understand a bit further:
import pandas as pd
def insert_boolean_for_duplicates(excelfile):
df = pd.read_excel(excelfile)
dc = df.duplicated()
print(dc)
print('------------')
print(df)
print('------------')
print(dc.drop_duplicates())
print('------------')
print(dc.values)
.
if __name__ == '__main__':
insert_boolean_for_duplicates('StudentList.xlsx')
I recommend you to check out this link, It may help you out.
CodePudding user response:
What you did is not too far from answering your problem. All you need is to assign the pandas serie returned to a new column in your dataframe (How to add a new column to an existing DataFrame?). I would complete it as follows:
import pandas as pd
df = pd.read_excel('C:/Users/Desktop/Studen List.xlsx') #ideally you would prefere using relative paths (i.e. ./relative_path_to_file/Student_list.xlsx)
df['Duplicate'] = df.duplicated()
df['Duplicate'] = df['Duplicate'].replace({True: 'Yes', False: 'No'}) # eventually if you want Yes / No labels rather than True / False
df.to_excel('./relative_path/Modified_Student_list.xlsx')
Hope this helps