I am new to Python, and I am about to solve some simple questions.
I have two excel datasets. The first one contains the samples I have taken on different days, with a separate "dates" column and a "month" column, as below:
and the other excel, with the date I only wanted
and I want to filter the first excel with the dates that are equal to the second excel, and print out the rows that meet these conditions.
Here is my code: 1st attempt:
import datetime
import pandas
import pandas as pd
import openpyxl
from pandas import DataFrame
from datetime import datetime
import datetime as dt
import numpy as np
df_1=pd.read_excel('waterstationin29UMT.xlsx',sheet_name='Sheet1') #first excel
print(df_1)
df_2=pd.read_excel('29UMTdate.xlsx',sheet_name='Sheet1') #second excel
print(df_2)
minyan_d = df_2.iloc[1:11:,1:4] #only want the date information
print(type(minyan_d))
# obtain datetime series:
df_2_1=pd.to_datetime(minyan_d['system:time_start'],format='%Y/%m/%d')
print(type(df_2_1.dt.year))
water_date = df_1['SampDay']
water_month = df_1['SampMonth']
df_1.where((df_1['SampDay'] == df_2_1.dt.day) and (df_1['SampMonth'] == df_2_1.dt.month))
```
and get the bug said that 'Can only compare identically-labeled series objects'
second attempt:
```for x in water_date:
for y in water_month:
if [x == (df_2_1.dt.day)] and [y ==(df_2_1.dt.month)]:
print(1)
else:
print(2)
I know some of the data in the first excel can not be found in the second excel, but I do not know why I only get one. I am new to Python. Just forgive the naive problems I have made.
CodePudding user response:
Use merge()
.
Example:
merged = pd.merge(df_1, df_2, how='inner', on=['SampMonth'])
This will join the rows that have the same entry in the SampMonth
column across the dataframes. You can also use how="outer"
, which results in a different merge.
Read the documentation to learn how to use it for your needs - it is quite helpful.
Or, if you just want to filter one dataframe, you could do this:
df_1 = df_1[df_1["SampMonth" == <insert month here>]]
This will return the rows in the dataframe where the month matches what you want -- you can make the internal boolean a compound boolean, as in (condition 1) & (condition 2) & (condition 3)
to make the whole date match.
CodePudding user response:
guys. I think I found the solution, but still being very complicated, I have transferred the first excel's data to day/month/year format, and compare these two via this code:) minmin = df_1['value'].isin(df_2.iloc[0:11,5:8]) which give me False/True. Any efficient and time-saving codes are needed also!