1.4 Date and Time

Date/time is the the messiest data type … by far. Handling them is a very important skill for statistical analysis. You will face them a lot of times!

R handles date/time in three classes:

  • Date class represents dates.
  • POSIXct and POSIXlt classes represent times.

POSIX stands for Portable Operating System Interface of UNIX, ct for Calender Time and lt for local time.

Internally, R stores dates as the number of days since 1970-01-01 and, times as the number of seconds since 1970-01-016 (except the POSIXlt class). This is why, 1st January 1970 is called epoch. In POSIXlt class, date/time are stored as a list of components (hour, min, sec, months etc.) making it easy to extract the parts7.

1.4.1 Parsing Date and Time

Often, you will get date/time as strings. There are several approach for parsing strings into date/times. Let’s try some of them. In this section, we will use 4 packages to explore them.

library("lubridate")
library("readr")
library("hms")
library("chron")

Date

A lot of possible formats can be used for representing dates. You can parse them in that many ways too.

With specified format

as.Date() and readr::parse_date() are good choice. They allow a wide range of input formats through the format = argument. The default format is a 4 digit year, followed by a 1 or 2 digit month, then a 1 or 2 digit day, separated by either dashes (-) or forward slashes (/), i.e., "%Y-%m-%d" and "%Y/%m/%d".

## Defalut ones
as.Date("2021-3-14")
#> [1] "2021-03-14"
as.Date("2004/11/6")
#> [1] "2004-11-06"

# But parse_date() has some drawback
parse_date("2004/11/6")
#> Warning: 1 parsing failure.
#> row col   expected    actual
#>   1  -- date like  2004/11/6
#> [1] NA
parse_date("2021-3-14")
#> Warning: 1 parsing failure.
#> row col   expected    actual
#>   1  -- date like  2021-3-14
#> [1] NA

# You have to include 0 for parsing single digit decimals
parse_date("2004/11/06")
#> [1] "2004-11-06"
parse_date("2021-03-14")
#> [1] "2021-03-14"

For dates not in standard format, you need to specify the format string according to the below table.

Code Value Remark
"%d" Day of the month (decimal number)
"%e" Optional leading space Only for readr::parse_date()
"%m" Month (decimal number)
"%B" Month (Full name) Case doesn’t matter
"%b" Month (3 letter abbrebiated name) Case doesn’t matter
"%Y" Year (4 digits) 00-69 -> 2000-2069, 70-99 -> 1970-1999
"%y" Year (2 digits)
as.Date("2/8/2021", format = "%m/%d/%Y")
#> [1] "2021-02-08"

# While specifying the format,
# you don't need to include 0 for single digit decimals
parse_date("7-1-71", format = "%d-%m-%y")
#> [1] "1971-01-07"

# Case doesn't matter
parse_date("SepteMBer 28, 2002", format = "%B %d, %Y")
#> [1] "2002-09-28"
as.Date("18AuG03", format = "%d%b%y")
#> [1] "2003-08-18"

You can use non-English month names with parse_date() specifying the locale = argument to locale().

Witout specified format

You can also use helpers provided by the lubridate package. They are short and unambiguous!

  • Identify the order of the year, month and day in your dates.
  • Arrange y, m and d in that exact order. It will be the name of the parsing function in lubridate.
# Unlike parse_date(), or like as.Date(),
# you don't need to include 0 in single digit decimals
ymd("2002-12-8")
#> [1] "2002-12-08"

# You may include "th" after day
mdy("January 7th, 1971")
#> [1] "1971-01-07"
mdy("January 7, 1971")
#> [1] "1971-01-07"

# Unquoted numbers are allowed
dym(28200209)
#> [1] "2002-09-28"

# Case doesn't matter
ydm("2002-28-SeP")
#> [1] "2002-09-28"

Check out the lubridate cheatsheet for more.

Parsing Times

Unlike dates, the time part of a time string has two kind of representation:

  • 24 hour clock, i.e., hh:mm:ss (default one)
  • 12 hour clock, hh:mm:ss followed by am or pm

But they have to be specified with a specific date. So there are many kind of representation of a specific time. readr::parse_time(), readr::parse_datetime()8, as.POSIXct() and as.POSIXlt() function can be used to parse them, specifying their formats. The default formats are

Format Functions
"%Y-%m-%d %H:%M:%OS" as.POSIXct(), as.POSIXlt() and readr::parse_datetime()
"%Y-%m-%d %H:%M:%S"
"%Y/%m/%d %H:%M:%OS"
"%Y/%m/%d %H:%M:%S"
"%Y-%m-%d %H:%M"
"%Y/%m/%d %H:%M"
"%Y-%m-%d" as.POSIXct(), as.POSIXlt(), readr::parse_datetime() and readr::parse_date()
"%Y/%m/%d"

As you have guessed correctly, the codes "%H" is for hours, "%M" for minutes, "%OS" for partial seconds and "%S" for integer seconds.

# Defaults
parse_datetime("2023-07-24 23:55:26")
#> [1] "2023-07-24 23:55:26 UTC"
time_1 <- as.POSIXct("2023-07-24 23:55:26")
time_1
#> [1] "2023-07-24 23:55:26 UTC"

# Specifying format
time_2 <- as.POSIXlt("25072023 08:32:07", format = "%d%m%Y %H:%M:%S")
time_2
#> [1] "2023-07-25 08:32:07 UTC"

# Don't forget to include dates!
as.POSIXct("08:05:06")
#> Error in as.POSIXlt.character(x, tz, ...): character string is not in a standard unambiguous format

# But parse_time() allows that
parse_time("08:05:06")
#> 08:05:06

# am/pm can be spcified
parse_time("4:06 pm")
#> 16:06:00
Specifying timezone

By default, as.POSIXct() function stores time with system’s time zone. But you can customize this with tz = argument. On the other hand, readr::parse_datetime() stores in UTC (same as GMT), which can be changed with locale = locale(tz = <TIME_ZONE>).

# In Asia/Singapore
parse_datetime("2020-01-01 11:42:03", locale = locale(tz = "Asia/Singapore"))
#> [1] "2020-01-01 11:42:03 +08"

# In GMT
as.POSIXct("2020-01-01 11:42:03", tz = "GMT")
#> [1] "2020-01-01 11:42:03 GMT"

# With system's tz
time_3 <- parse_datetime("2020-01-01 11:42:03",
  locale = locale(tz = Sys.timezone())
)
time_3
#> [1] "2020-01-01 11:42:03 UTC"

Sys.timezone() returns timezone of your system.

1.4.2 Extracting the components

While dealing with a long timeframe of data, the years, months, weekdays, weeks, quarters, day of the months etc. are often useful for insights. Let’s extract them from some famous statisticians’ birthdays.

statisticians_bdays <- c(
  CRRao = as.Date("1920-09-10"),
  PCMahalanobis = as.Date("1893-06-29"),
  Cramer = as.Date("1893-09-25"),
  KRParthasarathy = as.Date("1936-06-25")
)
With inbuilt functions

The inbuilt functions year(), months(), weekdays(), week(), quarter(), day() are used to obtain them. Names of these functions are self-explanatory.

# vector of years
year(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>            1920            1893            1893            1936

# vector of months
months(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>     "September"          "June"     "September"          "June"

# vector of weekdays
weekdays(statisticians_bdays)
#>           CRRao   PCMahalanobis          Cramer KRParthasarathy 
#>        "Friday"      "Thursday"        "Monday"      "Thursday"

# vector of week numbers
week(statisticians_bdays)
#> [1] 37 26 39 26

# vector of quarters
quarter(statisticians_bdays)
#> [1] 3 2 3 2

# vector of days of the months
day(statisticians_bdays)
#> [1] 10 29 25 25
From default components

You can strip out different components of a POSIXlt object with unclass() and unlist() functions.

# doesn't work for POSIXct objects!
unclass(time_1)
#> [1] 1690242926
#> attr(,"tzone")
#> [1] ""

# column form
unclass(time_2)
#> $sec
#> [1] 7
#> 
#> $min
#> [1] 32
#> 
#> $hour
#> [1] 8
#> 
#> $mday
#> [1] 25
#> 
#> $mon
#> [1] 6
#> 
#> $year
#> [1] 123
#> 
#> $wday
#> [1] 2
#> 
#> $yday
#> [1] 205
#> 
#> $isdst
#> [1] 0
#> 
#> $zone
#> [1] "UTC"
#> 
#> $gmtoff
#> [1] 0
#> 
#> attr(,"tzone")
#> [1] "UTC"
#> attr(,"balanced")
#> [1] TRUE

# list form
unlist(time_2)
#>    sec    min   hour   mday    mon   year   wday   yday  isdst   zone gmtoff 
#>    "7"   "32"    "8"   "25"    "6"  "123"    "2"  "205"    "0"  "UTC"    "0"

# extract seconds
time_2$sec
#> [1] 7

# extract weekday number
time_2$wday
#> [1] 2
Truncate the output
# date & time till the day
trunc(time_2, "days")
#> [1] "2023-07-25 UTC"

# date & time till the minutes
trunc(time_2, "mins")
#> [1] "2023-07-25 08:32:00 UTC"

1.4.3 Operations on date/time

date_f1 <- as.Date("04/08/2021", format = "%m/%d/%Y")
date_f2 <- as.Date("October 8, 2021", format = "%B %d, %Y")
Difference between 2 dates/times

The subtraction opoerator can be used to get difference between 2 dates in days

date_f1 - date_f2
#> Time difference of -183 days

time_2 - time_1
#> Time difference of 8.611389 hours

The inbuilt function difftime() specifies the diff in specified units

# in weeks
difftime(date_f1, date_f2, units = "weeks")
#> Time difference of -26.14286 weeks

# default is days
difftime(date_f1, date_f2)
#> Time difference of -183 days

# in seconds
difftime(time_1, as.POSIXct("1970-01-01 00:00:00", tz = "UTC"), units = "secs")
#> Time difference of 1690242926 secs

as.POSIXct("2021-03-10 08:32:07") - as.POSIXct("2023-03-09 23:55:26")
#> Time difference of -729.6412 days

You can even apply it on a vector of dates which will return the interval differences between consecutive vector elements.

three_days <- as.Date(c("2020-07-22", "2019-04-20", "2022-10-06"))

diff(three_days)
#> Time differences in days
#> [1] -459 1265
Addition and Subtraction of days and seconds

Any number added to or subtracted from a date object is treated as day(s). On the other hand, the same for a time object is considered as seconds.

# adding 10 days
date_f2 + 10
#> [1] "2021-10-18"

# suntracting 13 days
date_f1 - 13
#> [1] "2021-03-26"

# adding 30s
time_1 + 30
#> [1] "2023-07-24 23:55:56 UTC"

# subtracting 569s
time_1 - 569
#> [1] "2023-07-24 23:45:57 UTC"

Comparing with logical operators

Except the logical AND (&& and &) and logical OR (|| and |), all the usual logical operators can be used.

date_f1 > date_f2
#> [1] FALSE

date_f1 <= date_f2
#> [1] TRUE

time_2 != time_1
#> [1] TRUE
Sequence of dates

You can create a sequence of dates using seq() function specifying the starting date.

# 7 dates differs by 1 week
seq(date_f1, length = 7, by = "week")
#> [1] "2021-04-08" "2021-04-15" "2021-04-22" "2021-04-29" "2021-05-06"
#> [6] "2021-05-13" "2021-05-20"

# 7 dates differs by 14 days
seq(date_f1, length = 7, by = 14)
#> [1] "2021-04-08" "2021-04-22" "2021-05-06" "2021-05-20" "2021-06-03"
#> [6] "2021-06-17" "2021-07-01"

# 7 dates differs by 2 weeks
seq(date_f1, length = 7, by = "2 weeks")
#> [1] "2021-04-08" "2021-04-22" "2021-05-06" "2021-05-20" "2021-06-03"
#> [6] "2021-06-17" "2021-07-01"

# 7 dates differs by 7 months
seq(date_f1, length = 7, by = "7 months")
#> [1] "2021-04-08" "2021-11-08" "2022-06-08" "2023-01-08" "2023-08-08"
#> [6] "2024-03-08" "2024-10-08"

# 7 dates differs by 4 years
seq(date_f1, length = 7, by = "4 years")
#> [1] "2021-04-08" "2025-04-08" "2029-04-08" "2033-04-08" "2037-04-08"
#> [6] "2041-04-08" "2045-04-08"

1.4.4 The chron package

chron date/time objects are differnent from the usual ones. It returns time in chron format

Creating times in chron
# Defaults
time_1_ch <- as.chron("2013-07-24 23:55:26")
time_1_ch
#> [1] (07/24/13 23:55:26)

# Specifying format
time_2_ch <- as.chron("07/25/13", format = "%m/%d/%Y")
time_2_ch
#> [1] (07/25/13 00:00:00)
Extracting the date with dates()
dates(time_1_ch)
#>     day  
#> 07/24/13
Arithmetic of chron objects
# comparison
time_2_ch > time_1_ch
#> [1] FALSE

# Adding 10 days
time_1_ch + 10
#> [1] (08/03/13 23:55:26)

# subtraction
time_2_ch - time_1_ch
#> [1] -730485

# Difference in the unit specified unit
difftime(time_2_ch, time_1_ch, unit = "hours")
#> Time difference of -17531640 hours

# difference in the time
as.chron("2013-03-10 08:32:07") - as.chron("2013-03-09 23:55:26")
#> [1] 08:36:41

Remember that, chron does not adjust for the time zones

Exercises

Exercise 1.16 Questions

  • What is the significance of January 1, 1970 ?
  • What is the difference between as.Date() and POSIXlt() ?

References


  1. Peng (2022)↩︎

  2. Spector (2011a)↩︎

  3. Wickham and Grolemund (2017b), Wickham and Grolemund (2017c)↩︎