25
Handling temporal data in SQL
One of the most involved data-type to deal with in a database is temporal data. Temporal data, by the way, is data that represents a state in time. Some of the complexity of dealing with temporal data is caused by the numerous ways to describe a single date-time. The other complex thing is manipulating this type of data. This article will give an overview of the data type and several functions to help you deal with this data type.
There are six date/time types in PostgreSQL, which are divided into four main categories: date, time, timestamp, and interval.
In PostgreSQL, the timestamp data type follows ISO-8601 format.
In SQL, you can save the difference between two date/time sources. The data type that holds this value is called the interval. You can easily see interval in action by subtracting two date or time values:
SELECT CAST('2021-12-16' AS timestamp) - CAST('2021-12-11' AS timestamp) as interval;
interval
---------------
5 days
You can define interval using the INTERVAL keyword followed by quantity, unit, and optionally a direction.
The following query shows several ways you can define an interval.
SELECT INTERVAL '1 day' as i1,
INTERVAL '2 days 2 hours' as i2,
INTERVAL '1D 2H' as i3;
i1 | i2 | i3
------------+-----------------+----------------
1 day | 2 days 02:00:00 | 1 day 02:00:00
Quantities of days, hours, minutes and seconds can be defined without unit markings. For example, '2 01:35:09' is read the same as '2 days 1 hour 35 min 10 sec'.
There are other ways to define an interval. For instance, a combination of years and months can be specified with a dash; for example, ' 1-2' is read the same as '1 year 2 months'.
SELECT INTERVAL '1-2',
INTERVAL '1-2' YEAR TO MONTH;
interval | interval
--------------------+---------------
1 year 2 mons | 1 year 2 mons
But I think you should stick with the interval '<quantity> <unit>'
format, and it is easier to read and understand.
Time zone is a complex topic, and they are not simply Earth geometry; there are political decisions involved. So I will be short here.
The world is divided into 24 imaginary sections, called time zones. Some regions shift their time by one hour twice a year (implementing what is known as daylight saving time), so the time difference between two points on Earth might be four hours for one-half of the year and five hours for the other half of the year. Even within a time zone, different regions may not conform to daylight saving time, causing different clocks in the same time zone to agree for one-half of the year but be one hour different for the rest of the year.
To help deal with this issue, people used a reference for timekeeping. Fifteenth-century navigators set their clocks to the time of day in Greenwich, England, known as Greenwich Mean Time or GMT.
Today, we use Coordinated Universal Time, or UTC, which is based on an atomic clock (or, more precisely, the average time of 200 atomic clocks in 50 locations worldwide, referred to as Universal Time). You can describe other time zones by the number of hours difference from UTC; for example, you can define the time zone for Jakarta, Indonesia, as UTC+7:00.
In SQL, the date type does not have time zone information, but the time and timestamp type both have time zone information. This can get confusing because time zones have little meaning without a date, especially when using daylight to save time. If you need to work with time zones, you should work with the timestamp values.
Internally, all timezone-aware dates and times are stored in UTC. They are converted to local time before being displayed. The local time is specified in the TimeZone configuration.
In SQL, you can apply arithmetic operators between date and integer, between date and another date, or even interval.
Apart from arithmetic operators above, you can also use comparison operators (<, >, <=, >=, =, <>, !=). For dates and timestamps, they are all comparable with or without a time zone. But, times (with or without time zone) and intervals can only be compared to other values of the same data type.
When comparing a timestamp without a time zone to timestamp with a time zone, the timestamp without a time zone will be given a time zone specified by the TimeZone configuration. It is then converted to UTC for comparison to the timestamp with time zone (which is internally stored as UTC).
Get the current timestamp
SELECT CURRENT_TIMESTAMP;
current_timestamp
------------------------------------
2021-11-12 13:02:56.338088+07
The syntax of CURRENT_TIMESTAMP does not require a pair of parentheses at the end. But you can call the function with a precision parameter, which rounded the fractional digits in the seconds field.
SELECT CURRENT_TIMESTAMP, CURRENT_TIMESTAMP(0);
current_timestamp | current_timestamp
------------------------------------+------------------------
2021-11-12 13:26:25.138509+07 | 2021-11-12 13:26:25+07
Oddly you can't call the function with parentheses but no parameter. You will be getting a syntax error message if you try this.
Get the current date.
SELECT CURRENT_DATE;
current_date
-------------------
2021-11-12
Get the current time.
SELECT CURRENT_TIME;
current_time
-------------------------
13:00:30.898362+07
Like CURRENT_TIMESTAMP, CURRENT_TIME can also accept a precision parameter.
SELECT CURRENT_TIME, CURRENT_TIME(0);
current_time | current_time
-------------------------+--------------
13:31:13.307465+07 | 13:31:13+07
Get the current timestamp.
SELECT NOW();
now
------------------------------------
2021-11-12 13:03:04.731898+07
NOW return value is the same with CURRENT_TIMESTAMP output.
SELECT NOW(), CURRENT_TIMESTAMP;
now | current_timestamp
-----------------+-------------------------------
2021-11-12 13:07:29.248457+07 | 2021-11-12 13:07:29.248457+07
To deal with time zone, we can utilize AT TIME ZONE operator. AT TIME ZONE converts timestamp to/from timestamp with a time zone. There are two formats to use with this operator.
The following query shows the AT TIME ZONE clause in action.
Format 1
SELECT TIMESTAMP '2021-11-16 15:00' AS time_no_tz,
TIMESTAMP '2021-11-16 15:00' AT TIME ZONE 'America/Montreal' AS local_time;
time_no_tz | local_time
-------------------+------------------------
2021-11-16 15:00:00 | 2021-11-17 03:00:00+07
Format 2
SELECT CURRENT_TIMESTAMP AS time_with_tz,
CURRENT_TIMESTAMP AT TIME ZONE 'America/Montreal' AS in_montreal_no_tz;
time_with_tz | in_montreal_no_tz
------------------------------------+----------------------------
2021-11-16 15:32:03.210247+07 | 2021-11-16 03:32:03.210247
Age function returns the difference between two timestamps, and it produces the difference in interval.
AGE(timestamp1, timestamp2)
SELECT AGE(timestamp '2021-11-12 00:00:00', timestamp '2020-09-09 12:00:00');
age
------------------------------------
1 year 2 mons 2 days 12:00:00
If you supply only one timestamp for the functions, AGE() will subtract that timestamp from the current date at midnight.
SELECT AGE(timestamp '2009-09-12 10:11:00');
age
--------------------------------------
12 years 1 mon 29 days 13:49:00
When dealing with data for analysis, columns that use timestamp precision are often not useful—the majority of the time, you need to extract parts of timestamps. Or you need to convert/truncate the timestamp to standardize the date/time value. In PostgreSQL you can achieve this using EXTRACT(), DATE_PART() and DATE_TRUNC() functions.
EXTRACT() and DATE_PART() functions retrieves parts of date/time values or in some cases interval.
The following is a query to get the total invoice payments grouped by year and then quarter which we extract from the invoice_date column.
SELECT
EXTRACT(year FROM invoice_date) AS year,
EXTRACT(quarter FROM invoice_date) AS quarter,
SUM(total) as total_payments
FROM invoice
GROUP BY 1, 2
ORDER BY 1, 2;
year | quarter | total_payments
-----------+---------+----------------
2009 | 1 | 110.88
2009 | 2 | 112.86
2009 | 3 | 112.86
2009 | 4 | 112.86
2010 | 1 | 143.86
2010 | 2 | 112.86
2010 | 3 | 111.87
2010 | 4 | 112.86
2011 | 1 | 112.86
2011 | 2 | 144.86
2011 | 3 | 112.86
2011 | 4 | 99.00
<---------- TRUNCATED ---------->
DATE_PART works similar to EXTRACT, except the part, needs to be a string, not a name like on EXTRACT. The valid part names for DATE_PART are the same for EXTRACT.
The following query is the same query from the previous section with EXTRACT expression replaced with DATE_PART.
SELECT
DATE_PART('year', invoice_date) AS year,
DATE_PART('quarter', invoice_date) AS quarter,
SUM(total) as total_payments
FROM invoice
GROUP BY 1, 2
ORDER BY 1, 2;
The results are identical.
The DATE_TRUNC() function will truncate timestamp or interval data types.
The field is the precision to truncate the source value. The return value is of type timestamp or interval. The return value that is less significant than the selected part will be set to zero or one (for day and month).
SELECT DATE_TRUNC('year', TIMESTAMP '2021-11-12 13:03:04') as year_truncated,
DATE_TRUNC('month', TIMESTAMP '2021-11-12 13:03:04') as month_truncated;
date_trunc | date_trunc
--------------------------+---------------------
2021-01-01 00:00:00 | 2021-11-01 00:00:0000
25