I am starting in datetime analysis using python. I have two columns in a dataframe
import pandas as pd
a= [('2021-06-11 00:00:00'),
('2021-06-09 00:00:00'),
('2021-06-10 00:00:00'),
('2021-06-11 00:00:00'),
('2021-06-12 00:00:00'),
('2021-06-13 00:00:00'),
('2021-06-14 00:00:00'),
('2021-06-15 00:00:00'),
('2021-06-16 00:00:00'),
('2021-06-17 00:00:00'),
('2021-06-18 00:00:00'),
('2021-06-19 00:00:00'),
('2021-06-20 00:00:00')]
b = ["a", "a", "a", "a", "a", "a", "a", "b", "b","b" ,"b", "b"]
fecha = pd.DataFrame(list(zip(a,b)))
fecha[0] = pd.to_datetime(fecha[0])
I would like to select just the rows wich contains the older dates for "a" and "b", i try some metodos but i could not doing it.
Thank you!
CodePudding user response:
Simpler method would be to sort the dataframe according to the column you like then fetch the values of it against a
import pandas as pd
a= [('2021-06-11 00:00:00'),
('2021-06-09 00:00:00'),
('2021-06-10 00:00:00'),
('2021-06-11 00:00:00'),
('2021-06-12 00:00:00'),
('2021-06-13 00:00:00'),
('2021-06-14 00:00:00'),
('2021-06-15 00:00:00'),
('2021-06-16 00:00:00'),
('2021-06-17 00:00:00'),
('2021-06-18 00:00:00'),
('2021-06-19 00:00:00'),
('2021-06-20 00:00:00')]
b = ["a", "a", "a", "a", "a", "a", "a", "b", "b","b" ,"b", "b","b"]
fecha = pd.DataFrame({"a":a,"b":b})
fecha = fecha.sort_values(by=['a'])
fecha.head(3).where(fecha["b"] == "a")
and the output is
a b
1 2021-06-09 00:00:00 a
2 2021-06-10 00:00:00 a
0 2021-06-11 00:00:00 a
methods used : sort_values where
CodePudding user response:
I converted a
to datetime objects and did it that way. Also, I had to add an extra item to your list b
as the lengths a
and b
were different.
import datetime
a= [('2021-06-11 00:00:00'),
('2021-06-09 00:00:00'),
('2021-06-10 00:00:00'),
('2021-06-11 00:00:00'),
('2021-06-12 00:00:00'),
('2021-06-13 00:00:00'),
('2021-06-14 00:00:00'),
('2021-06-15 00:00:00'),
('2021-06-16 00:00:00'),
('2021-06-17 00:00:00'),
('2021-06-18 00:00:00'),
('2021-06-19 00:00:00'),
('2021-06-20 00:00:00')]
b = ["a", "a", "a", "a", "a", "a", "a", "b", "b", "b" ,"b", "b", "b"]
a0 = [datetime.datetime.strptime(i, "%Y-%m-%d %H:%M:%S") for i in a]
print("min a:", min([a0[i] for i in range(len(a0)) if b[i] == "a"]))
print("min b:", min([a0[i] for i in range(len(a0)) if b[i] == "b"]))
#min a: 2021-06-09 00:00:00
#min b: 2021-06-15 00:00:00
To do this in a dataframe, as you asked, this is one option:
import pandas as pd
df = pd.DataFrame(list(zip(a0,b)), columns = ["date", "letter"])
print(min(df.loc[df["letter"].eq("a")]["date"]))
print(min(df.loc[df["letter"].eq("b")]["date"]))
#2021-06-09 00:00:00
#2021-06-15 00:00:00
All that said, you don't have to use datetime
although I would recommend it.
import pandas as pd
print("min a:", min([a[i] for i in range(len(a0)) if b[i] == "a"]))
print("min b:", min([a[i] for i in range(len(a0)) if b[i] == "b"]))
#min a: 2021-06-09 00:00:00
#min b: 2021-06-15 00:00:00
df = pd.DataFrame(list(zip(a,b)), columns = ["date", "letter"])
print(min(df.loc[df["letter"].eq("a")]["date"]))
print(min(df.loc[df["letter"].eq("b")]["date"]))
#2021-06-09 00:00:00
#2021-06-15 00:00:00
CodePudding user response:
Another method would be to group by the second column (indexed as 1) and get the minimum of from the first as:
fecha.groupby([1]).agg('min')
The output is:
0
1
a 2021-06-09
b 2021-06-15