i have this code to convert the date to date that I want:
df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-' x,'%d-%m-%y').date())
df['issue_d'] = pd.to_datetime(df['issue_d'], format = '%Y-%m-%d')
but when I run it, this error would appear:
ValueError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_25672/2570429248.py in <module>
----> 1 df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-',
'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct- ': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-' x,'%d-%m-%y').date())
2 df['issue_d'] = pd.to_datetime(df['issue_d'], format = '%Y-%m-%d')
~\anaconda3\lib\site-packages\pandas\core\series.py in apply(self, func, convert_dtype,
args, **kwargs)
4355 dtype: float64
4356 """
-> 4357 return SeriesApply(self, func, convert_dtype, args, kwargs).apply()
4358
4359 def _reduce(
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply(self)
1041 return self.apply_str()
1042
-> 1043 return self.apply_standard()
1044
1045 def agg(self):
~\anaconda3\lib\site-packages\pandas\core\apply.py in apply_standard(self)
1096 # List[Union[Callable[..., Any], str]]]]]"; expected
1097 # "Callable[[Any], Any]"
-> 1098 mapped = lib.map_infer(
1099 values,
1100 f, # type: ignore[arg-type]
~\anaconda3\lib\site-packages\pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
~\AppData\Local\Temp/ipykernel_25672/2570429248.py in <lambda>(x)
----> 1 df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-' x,'%d-%m-%y').date())
2 df['issue_d'] = pd.to_datetime(df['issue_d'], format = '%Y-%m-%d')
~\anaconda3\lib\_strptime.py in _strptime_datetime(cls, data_string, format)
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] (fraction,)
~\anaconda3\lib\_strptime.py in _strptime(data_string, format)
347 found = format_regex.match(data_string)
348 if not found:
--> 349 raise ValueError("time data %r does not match format %r" %
350 (data_string, format))
351 if len(data_string) != found.end():
ValueError: time data '01-15-Dec' does not match format '%d-%m-%y'
****update:
my ['issue_d'] column's info is like :
issue_d 1048563 non-null object
that includes years-months(names) like:
15-Dec
16-Jan
and etc.
We should first: change the month's names (Jan, Feb, Mar,...) to their numbers (01,02,03,...), so the output for the column be like:
15-12
16-01
and etc.
and add day (1) to them. so that my dates arrange be like:
01-01-15
01-02-15
01-03-15
and etc.
That ((apply)) part is that day 1 that I tried to add, the second is the months, and the third is the years.
you can see in my first line that i tried to do this:
df['issue_d'] = df['issue_d'].replace({'Jan-':'1-', 'Feb-':'2-', 'Mar-': '3-', 'Apr-': '4-', 'May-': '5-', 'Jun-': '6-', 'Jul-': '7-', 'Aug-':'8-', 'Sep-': '9-', 'Oct-': '10-', 'Nov-': '11-', 'Dec-': '12-'}, regex=True).apply(lambda x:dt.strptime('01-' x,'%d-%m-%y').date())
I changed Abbreviated month names in the first line because pandas DateTime can't figure it out and makes it like DateTime. In the second line, I tried to change the arrangement in %Y-%m-%d format and change the column to a data frame to do further work on my dataset. But unfortunately, that error appeared. I'd appreciate it if you help me. Thank you
CodePudding user response:
thanks for helping me with the first line, there is an error that appeared when i tried you run another code after the code above, i'd like to check the number of loans issued by year, so this is my code:
df['issue_y'] = df['issue_d'].dt.year
issue_y = df.pivot_table(values='id', columns='Default
Loan', index='issue_y', aggfunc='count')
issue_y.plot.bar(stacked = True, figsize=(12,5))
plt.xlabel('Year')
plt.xticks(rotation=0)
plt.ylabel('Number of loans issued')
plt.title('Number of loans issued by Year')
#plt.savefig('N_Loans_Year.png')
plt.show()
issue_y['% of Default'] = ((issue_y['Default'] /
(issue_y['Default'] issue_y['Non
Default']))*100).round(1)
issue_y.sort_values('% of Default',
ascending=False).head(10)
and i got this error:
AttributeError Traceback
(most recent call last)
~\AppData\Local\Temp/ipykernel_24600/1330149262.py in
<module>
----> 1 df2['issue_y'] = df2['issue_d'].dt.year
2 issue_y = df2.pivot_table(values='id',
columns='Default Loan', index='issue_y',
aggfunc='count')
3 issue_y.plot.bar(stacked = True, figsize=(12,5))
4 plt.xlabel('Year')
5 plt.xticks(rotation=0)
~\anaconda3\lib\site-packages\pandas\core\generic.py
in __getattr__(self, name)
5485 ):
5486 return self[name]
-> 5487 return object.__getattribute__(self,
name)
5488
5489 def __setattr__(self, name: str, value) ->
None:
~\anaconda3\lib\site-packages\pandas\core\accessor.py
in __get__(self, obj, cls)
179 # we're accessing the attribute of the
class, i.e., Dataset.geo
180 return self._accessor
--> 181 accessor_obj = self._accessor(obj)
182 # Replace the property with the accessor
object. Inspired by:
183 # https://www.pydanny.com/cached-
property.html
~\anaconda3\lib\site-
packages\pandas\core\indexes\accessors.py in
__new__(cls, data)
504 return PeriodProperties(data, orig)
505
--> 506 raise AttributeError("Can only use .dt
accessor with datetimelike values")
AttributeError: Can only use .dt accessor with
datetimelike values
CodePudding user response:
Okay, next try:
If you're not using an English locale, then you could try:
df = pd.DataFrame({"issue_d": ["15-Dec", "16-Jan", "21-Oct"]})
mapping = {"Jan": "1", "Feb": "2", "Mar": "3", "Apr": "4", "May": "5", "Jun": "6",
"Jul": "7", "Aug": "8", "Sep": "9", "Oct": "10", "Nov": "11", "Dec": "12"}
df["issue_d"] = pd.to_datetime(
df["issue_d"].str[:-3] df["issue_d"].str[-3:].replace(mapping),
format="%y-%m"
).dt.strftime("%d-%m-%y")
Result:
issue_d
0 01-12-15
1 01-01-16
2 01-10-21
You get the first of the month automatically.
If you are using an English locale, then this gives the same result:
df = pd.DataFrame({"issue_d": ["15-Dec", "16-Jan", "21-Oct"]})
df["issue_d"] = pd.to_datetime(df["issue_d"], format="%y-%b").dt.strftime("%d-%m-%y")
Regarding your question extension: If the issued_d
column needs to be datetime for further processing then remove the .dt.strftime("%d-%m-%y")
at the end (because this makes strings out of the datetimes), do what you need to do, and convert it to strings later. For example
...
df["issue_d"] = pd.to_datetime(
df["issue_d"].str[:-3] df["issue_d"].str[-3:].replace(mapping),
format="%y-%m"
)
df["issue_y"] = df["issue_d"].dt.year
df["issue_d"] = df["issue_d"].dt.strftime("%d-%m-%y")
results in
issue_d issue_y
0 01-12-15 2015
1 01-01-16 2016
2 01-10-21 2021