Date Anomolies

As we all know, dates can be a harrowing/boarderline traumatic experience. Here I give a condensed and hopefully rapid overview of dates in R.

There are multiple date classes in R. The simplest is the Date class, but there are also POSIXt dates and a relatively recent Lubridate package.

Date Class

These are the `easy’ ones to deal with and are stored internally as integers from 1970-01-01.

as.Date("2017-01-12")
as.Date("2017-24-01", format = "%Y-%d-%m")
# Alternative origin - i.e. Excel
as.Date(400, origin = "1900-01-01")
difftime(as.Date("2018-01-01"), as.Date("2017-01-01"), units = "days")

POSIXt - datetime variants

Portable Operating System Interface (POSIX) is an interoperability standard (that not many people seem to have much faith in). In R there is POSIXlt and POSIXct for local and for calendar time respecitively. The former is a list, the latter is a number (seconds since origin):

> unclass(as.POSIXlt(Sys.time()))
$sec
[1] 30.34591

$min
[1] 35

$hour
[1] 12

$mday
[1] 25

$mon
[1] 3

$year
[1] 118

$wday
[1] 3

$yday
[1] 114

$isdst
[1] 0

$zone
[1] "AEST"

$gmtoff
[1] 36000

attr(,"tzone")
[1] ""     "AEST" "AEDT"



# POSIXct
unclass(Sys.time())
[1] 1524623797

You can display (or store) dates in different formats by using the format command or option:

as.POSIXct("101221 10:22", format = "%y%m%d %H:%M")
[1] "2010-12-21 10:22:00 AEDT"

as.character(as.POSIXct("101221 10:22", 
             format = "%y%m%d %H:%M"), format = "%m-%d-%y %H:%M")
[1] "12-21-10 10:22"

Data frames do weird things to dates. Specifically, when you store a POSIXlt variable in a data.frame it is converted to a POSIXct. However, if you assign a POSIXlt to a data.frame field using $ then the lt class is retained!

hotdate <- "20081101 01:20:00"
ct <- as.POSIXct(hotdate, format = "%Y%m%d %H:%M:%S")
lt <- as.POSIXlt(hotdate, format = "%Y%m%d %H:%M:%S")

df <- data.frame(orig = hotdate, 
                 ct = ct, 
                 lt = lt)

# POSIXlt converted to ct
> str(df)
'data.frame': 1 obs. of  3 variables:
 $ orig: Factor w/ 1 level "20081101 01:20:00": 1
 $ ct  : POSIXct, format: "2008-11-01 01:20:00"
 $ lt  : POSIXct, format: "2008-11-01 01:20:00"

# POSIXlt NOT converted to ct!!!!
df$lt2 <- as.POSIXlt(hotdate, format = "%Y%m%d %H:%M:%S")
str(df)
'data.frame': 1 obs. of  4 variables:
 $ orig: Factor w/ 1 level "20081101 01:20:00": 1
 $ ct  : POSIXct, format: "2008-11-01 01:20:00"
 $ lt  : POSIXct, format: "2008-11-01 01:20:00"
 $ lt2 : POSIXlt, format: "2008-11-01 01:20:00"


# Rounding
# NOPE
df[, "ct"] <- round(df[, "ct"], units = "hours")

# FORCE
df[, "ct"] <- as.POSIXct(round(ct, units = "hours"))

# NAH
df[, "lt"] <- round(lt, units = "hours")
Warning message:
  In `[<-.data.frame`(`*tmp*`, , "lt", 
                      value = list(sec = 0, min = 0L,  :
                      provided 11 variables to replace 1 variables

# MAGIC via $  !
df$lt <- round(lt, units = "hours")                     

Lubridate

Makes life a bit easier (sometimes) and represents time via different objects. Important concepts are:

  • instants
  • intervals
  • durations
  • periods
ymd_hms("2012-12-31 23:59:59")
## [1] "2012-12-31 23:59:59 UTC"

month(ldate) <- 8

# An _instant_ in time:
lubridate::now()

# Last day of the current month
lubridate::ceiling_date(now(), unit = "month") - lubridate::days(1)

# The time between two instants is an _interval_ 

# Interval constructed using _duration_ i.e. dyears
# dyears record the exact number of seconds in a time span and deals with
# leap years, daylight savings etc so you need to be careful in its use.
lubridate::ymd_hms(hotdate) - lubridate::dyears(1)

# versus
# Interval constructed using _period_ i.e. years
lubridate::ymd_hms(hotdate) - lubridate::years(1)


# Nice
now() %within% interval(ymd("2012-07-01"), ymd("2019-06-30"))

Painless?

Related