1.4 Date and Time

Date/time is the the messiest data type … by far. Handling them is a very important skill for statistical analysis. You will face them a lot of times!

R handles date/time in three classes:

Date class represents dates.
POSIXct and POSIXlt classes represent times.

POSIX stands for Portable Operating System Interface of UNIX, ct for Calender Time and lt for local time.

Internally, R stores dates as the number of days since 1970-01-01 and, times as the number of seconds since 1970-01-01⁶ (except the POSIXlt class). This is why, 1st January 1970 is called epoch. In POSIXlt class, date/time are stored as a list of components (hour, min, sec, months etc.) making it easy to extract the parts⁷.

1.4.1 Parsing Date and Time

Often, you will get date/time as strings. There are several approach for parsing strings into date/times. Let’s try some of them. In this section, we will use 4 packages to explore them.

library("lubridate")
library("readr")
library("hms")
library("chron")

Date

A lot of possible formats can be used for representing dates. You can parse them in that many ways too.

With specified format

as.Date() and readr::parse_date() are good choice. They allow a wide range of input formats through the format = argument. The default format is a 4 digit year, followed by a 1 or 2 digit month, then a 1 or 2 digit day, separated by either dashes (-) or forward slashes (/), i.e., "%Y-%m-%d" and "%Y/%m/%d".

## Defalut ones
as.Date("2021-3-14")
#> [1] "2021-03-14"
as.Date("2004/11/6")
#> [1] "2004-11-06"

# But parse_date() has some drawback
parse_date("2004/11/6")
#> Warning: 1 parsing failure.
#> row col   expected    actual
#>   1  -- date like  2004/11/6
#> [1] NA
parse_date("2021-3-14")
#> Warning: 1 parsing failure.
#> row col   expected    actual
#>   1  -- date like  2021-3-14
#> [1] NA

# You have to include 0 for parsing single digit decimals
parse_date("2004/11/06")
#> [1] "2004-11-06"
parse_date("2021-03-14")
#> [1] "2021-03-14"

For dates not in standard format, you need to specify the format string according to the below table.

Code	Value	Remark
`"%d"`	Day of the month (decimal number)
`"%e"`	Optional leading space	Only for `readr::parse_date()`
`"%m"`	Month (decimal number)
`"%B"`	Month (Full name)	Case doesn’t matter
`"%b"`	Month (3 letter abbrebiated name)	Case doesn’t matter
`"%Y"`	Year (4 digits)	00-69 -> 2000-2069, 70-99 -> 1970-1999
`"%y"`	Year (2 digits)

as.Date("2/8/2021", format = "%m/%d/%Y")
#> [1] "2021-02-08"

# While specifying the format,
# you don't need to include 0 for single digit decimals
parse_date("7-1-71", format = "%d-%m-%y")
#> [1] "1971-01-07"

# Case doesn't matter
parse_date("SepteMBer 28, 2002", format = "%B %d, %Y")
#> [1] "2002-09-28"
as.Date("18AuG03", format = "%d%b%y")
#> [1] "2003-08-18"

You can use non-English month names with parse_date() specifying the locale = argument to locale().

Witout specified format

You can also use helpers provided by the lubridate package. They are short and unambiguous!

Identify the order of the year, month and day in your dates.
Arrange y, m and d in that exact order. It will be the name of the parsing function in lubridate.

# Unlike parse_date(), or like as.Date(),
# you don't need to include 0 in single digit decimals
ymd("2002-12-8")
#> [1] "2002-12-08"

# You may include "th" after day
mdy("January 7th, 1971")
#> [1] "1971-01-07"
mdy("January 7, 1971")
#> [1] "1971-01-07"

# Unquoted numbers are allowed
dym(28200209)
#> [1] "2002-09-28"

# Case doesn't matter
ydm("2002-28-SeP")
#> [1] "2002-09-28"

Check out the lubridate cheatsheet for more.

Parsing Times

Unlike dates, the time part of a time string has two kind of representation:

24 hour clock, i.e., hh:mm:ss (default one)
12 hour clock, hh:mm:ss followed by am or pm

But they have to be specified with a specific date. So there are many kind of representation of a specific time. readr::parse_time(), readr::parse_datetime()⁸, as.POSIXct() and as.POSIXlt() function can be used to parse them, specifying their formats. The default formats are

Format	Functions
`"%Y-%m-%d %H:%M:%OS"`	`as.POSIXct()`, `as.POSIXlt()` and `readr::parse_datetime()`
`"%Y-%m-%d %H:%M:%S"`	”
`"%Y/%m/%d %H:%M:%OS"`	”
`"%Y/%m/%d %H:%M:%S"`	”
`"%Y-%m-%d %H:%M"`	”
`"%Y/%m/%d %H:%M"`	”
`"%Y-%m-%d"`	`as.POSIXct()`, `as.POSIXlt()`, `readr::parse_datetime()` and `readr::parse_date()`
`"%Y/%m/%d"`	”

As you have guessed correctly, the codes "%H" is for hours, "%M" for minutes, "%OS" for partial seconds and "%S" for integer seconds.

# Defaults
parse_datetime("2023-07-24 23:55:26")
#> [1] "2023-07-24 23:55:26 UTC"
time_1 <- as.POSIXct("2023-07-24 23:55:26")
time_1
#> [1] "2023-07-24 23:55:26 UTC"

# Specifying format
time_2 <- as.POSIXlt("25072023 08:32:07", format = "%d%m%Y %H:%M:%S")
time_2
#> [1] "2023-07-25 08:32:07 UTC"

# Don't forget to include dates!
as.POSIXct("08:05:06")
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format

# But parse_time() allows that
parse_time("08:05:06")
#> 08:05:06

# am/pm can be spcified
parse_time("4:06 pm")
#> 16:06:00

Specifying timezone

By default, as.POSIXct() function stores time with system’s time zone. But you can customize this with tz = argument. On the other hand, readr::parse_datetime() stores in UTC (same as GMT), which can be changed with locale = locale(tz = <TIME_ZONE>).

# In Asia/Singapore
parse_datetime("2020-01-01 11:42:03", locale = locale(tz = "Asia/Singapore"))
#> [1] "2020-01-01 11:42:03 +08"

# In GMT
as.POSIXct("2020-01-01 11:42:03", tz = "GMT")
#> [1] "2020-01-01 11:42:03 GMT"

# With system's tz
time_3 <- parse_datetime("2020-01-01 11:42:03",
  locale = locale(tz = Sys.timezone())
)
time_3
#> [1] "2020-01-01 11:42:03 UTC"

Sys.timezone() returns timezone of your system.

1.4.2 Extracting the components

While dealing with a long timeframe of data, the years, months, weekdays, weeks, quarters, day of the months etc. are often useful for insights. Let’s extract them from some famous statisticians’ birthdays.

statisticians_bdays <- c(
  CRRao = as.Date("1920-09-10"),
  PCMahalanobis = as.Date("1893-06-29"),
  Cramer = as.Date("1893-09-25"),
  KRParthasarathy = as.Date("1936-06-25")
)

With inbuilt functions

The inbuilt functions year(), months(), weekdays(), week(), quarter(), day() are used to obtain them. Names of these functions are self-explanatory.

# vector of years
year(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>            1920            1893            1893            1936

# vector of months
months(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>     "September"          "June"     "September"          "June"

# vector of weekdays
weekdays(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>        "Friday"      "Thursday"        "Monday"      "Thursday"

# vector of week numbers
week(statisticians_bdays)
#> [1] 37 26 39 26

# vector of quarters
quarter(statisticians_bdays)
#> [1] 3 2 3 2

# vector of days of the months
day(statisticians_bdays)
#> [1] 10 29 25 25

From default components

You can strip out different components of a POSIXlt object with unclass() and unlist() functions.

# doesn't work for POSIXct objects!
unclass(time_1)
#> [1] 1690242926
#> attr(,"tzone")
#> [1] ""

# column form
unclass(time_2)
#> $sec
#> [1] 7
#> 
#> $min
#> [1] 32
#> 
#> $hour
#> [1] 8
#> 
#> $mday
#> [1] 25
#> 
#> $mon
#> [1] 6
#> 
#> $year
#> [1] 123
#> 
#> $wday
#> [1] 2
#> 
#> $yday
#> [1] 205
#> 
#> $isdst
#> [1] 0
#> 
#> $zone
#> [1] "UTC"
#> 
#> $gmtoff
#> [1] 0
#> 
#> attr(,"tzone")
#> [1] "UTC"
#> attr(,"balanced")
#> [1] TRUE

# list form
unlist(time_2)
#>    sec    min   hour   mday    mon   year   wday   yday  isdst   zone gmtoff 
#>    "7"   "32"    "8"   "25"    "6"  "123"    "2"  "205"    "0"  "UTC"    "0"

# extract seconds
time_2$sec
#> [1] 7

# extract weekday number
time_2$wday
#> [1] 2

Truncate the output

# date & time till the day
trunc(time_2, "days")
#> [1] "2023-07-25 UTC"

# date & time till the minutes
trunc(time_2, "mins")
#> [1] "2023-07-25 08:32:00 UTC"

1.4.3 Operations on date/time

date_f1 <- as.Date("04/08/2021", format = "%m/%d/%Y")
date_f2 <- as.Date("October 8, 2021", format = "%B %d, %Y")

Difference between 2 dates/times

The subtraction opoerator can be used to get difference between 2 dates in days

date_f1 - date_f2
#> Time difference of -183 days

time_2 - time_1
#> Time difference of 8.611389 hours

The inbuilt function difftime() specifies the diff in specified units

# in weeks
difftime(date_f1, date_f2, units = "weeks")
#> Time difference of -26.14286 weeks

# default is days
difftime(date_f1, date_f2)
#> Time difference of -183 days

# in seconds
difftime(time_1, as.POSIXct("1970-01-01 00:00:00", tz = "UTC"), units = "secs")
#> Time difference of 1690242926 secs

as.POSIXct("2021-03-10 08:32:07") - as.POSIXct("2023-03-09 23:55:26")
#> Time difference of -729.6412 days

You can even apply it on a vector of dates which will return the interval differences between consecutive vector elements.

three_days <- as.Date(c("2020-07-22", "2019-04-20", "2022-10-06"))

diff(three_days)
#> Time differences in days
#> [1] -459 1265

Addition and Subtraction of days and seconds

Any number added to or subtracted from a date object is treated as day(s). On the other hand, the same for a time object is considered as seconds.

# adding 10 days
date_f2 + 10
#> [1] "2021-10-18"

# suntracting 13 days
date_f1 - 13
#> [1] "2021-03-26"

# adding 30s
time_1 + 30
#> [1] "2023-07-24 23:55:56 UTC"

# subtracting 569s
time_1 - 569
#> [1] "2023-07-24 23:45:57 UTC"

Comparing with logical operators

Except the logical AND (&& and &) and logical OR (|| and |), all the usual logical operators can be used.

date_f1 > date_f2
#> [1] FALSE

date_f1 <= date_f2
#> [1] TRUE

time_2 != time_1
#> [1] TRUE

Sequence of dates

You can create a sequence of dates using seq() function specifying the starting date.

# 7 dates differs by 1 week
seq(date_f1, length = 7, by = "week")
#> [1] "2021-04-08" "2021-04-15" "2021-04-22" "2021-04-29" "2021-05-06"
#> [6] "2021-05-13" "2021-05-20"

# 7 dates differs by 14 days
seq(date_f1, length = 7, by = 14)
#> [1] "2021-04-08" "2021-04-22" "2021-05-06" "2021-05-20" "2021-06-03"
#> [6] "2021-06-17" "2021-07-01"

# 7 dates differs by 2 weeks
seq(date_f1, length = 7, by = "2 weeks")
#> [1] "2021-04-08" "2021-04-22" "2021-05-06" "2021-05-20" "2021-06-03"
#> [6] "2021-06-17" "2021-07-01"

# 7 dates differs by 7 months
seq(date_f1, length = 7, by = "7 months")
#> [1] "2021-04-08" "2021-11-08" "2022-06-08" "2023-01-08" "2023-08-08"
#> [6] "2024-03-08" "2024-10-08"

# 7 dates differs by 4 years
seq(date_f1, length = 7, by = "4 years")
#> [1] "2021-04-08" "2025-04-08" "2029-04-08" "2033-04-08" "2037-04-08"
#> [6] "2041-04-08" "2045-04-08"

1.4.4 The `chron` package

chron date/time objects are differnent from the usual ones. It returns time in chron format

Creating times in chron

# Defaults
time_1_ch <- as.chron("2013-07-24 23:55:26")
time_1_ch
#> [1] (07/24/13 23:55:26)

# Specifying format
time_2_ch <- as.chron("07/25/13", format = "%m/%d/%Y")
time_2_ch
#> [1] (07/25/13 00:00:00)

Extracting the date with `dates()`

dates(time_1_ch)
#>     day  
#> 07/24/13

Arithmetic of `chron` objects

# comparison
time_2_ch > time_1_ch
#> [1] FALSE

# Adding 10 days
time_1_ch + 10
#> [1] (08/03/13 23:55:26)

# subtraction
time_2_ch - time_1_ch
#> [1] -730485

# Difference in the unit specified unit
difftime(time_2_ch, time_1_ch, unit = "hours")
#> Time difference of -17531640 hours

# difference in the time
as.chron("2013-03-10 08:32:07") - as.chron("2013-03-09 23:55:26")
#> [1] 08:36:41

Remember that, chron does not adjust for the time zones

Exercises

Exercise 1.16 Questions

What is the significance of January 1, 1970 ?
What is the difference between as.Date() and POSIXlt() ?

References

Peng, Roger D. 2022. “R Programming for Data Science.” In. https://bookdown.org/rdpeng/rprogdatascience/dates-and-times.html.

Spector, Phil. 2011a. “Dates and Times in R.”

———. 2017b. “R for Data Science.” In. https://r4ds.had.co.nz/dates-and-times.html.

———. 2017c. “R for Data Science.” In. https://r4ds.had.co.nz/data-import.html#readr-datetimes.

Peng (2022)↩︎
Spector (2011a)↩︎
Wickham and Grolemund (2017b), Wickham and Grolemund (2017c)↩︎