Dmitry Romanoff
Posted on April 5, 2024
There is the PostgreSQL table my_table that has the following structure:
postgres=# \d my_table
Table "public.my_table"
Column | Type | Collation | Nullable | Default
--------+-----------------------------+-----------+----------+--------------------------------------
id | integer | | not null | nextval('my_table_id_seq'::regclass)
a | character varying(100) | | |
b | timestamp without time zone | | not null |
c | smallint | | |
The table is populated with some random data:
select * from my_table limit 10;
id | a | b | c
-------+-----------------------------------------------------------------------------------------------------+----------------------------+------
20835 | lysurmsbdfwrrasnnlmzujwdtuw | 1978-09-05 01:45:25.148229 | 8642
20836 | spfgyjksbxxsbfyixbhygzqizeiisldfcjssvwdaxdpkrwjsmhwqvozfkcipissbgi | 1915-06-16 12:08:08.978399 | 6590
20837 | kxibatprlkckkotcckqfmmroflbhvakoxipuqwjmc | 1992-10-11 08:50:10.587357 | 606
20838 | ifjekeirgkodhqakselytruphlxsnnlblwdyjkhrjxibpftiexqrsdtomjhuuntozifcwahdsrekhfbaecwbvow | 1995-06-09 15:44:00.125167 | 8600
20839 | duwdzmvzwkcygyjqokctszlffcnpclojzvweseaibvemghlgentemvboyhszzlmdmkfrugkxkdkowoyeyogercckpygfuukluth | 2009-06-22 07:26:56.044023 | 3569
20840 | qvuvlftidrwitheqywzvcvhvmvmkxelkvxyufjfdkvybeyajnmoldqkuwxioyoiykeoibdvnbdbyyaufdqpjlrdgbbkvsy | 1900-11-18 17:56:54.171598 | 8399
20841 | bpxdytnu | 1981-11-06 08:09:35.857618 | 9508
20842 | phwsewlpmerayuovgakjtbzflggeqqqxsqetxufuoe | 1933-12-02 02:50:20.117185 | 5099
20843 | nwxjdoksxwsvkjmpfyvayvwqwckyeyqxrlagn | 1999-11-12 01:56:10.847324 | 7316
20844 | ksjvvwpisjfpsapwccvdpcfgchjyrhwqof | 1958-08-06 07:56:47.420669 | 5673
(10 rows)
Let’s check the distribution of the timestamp column data.
Distribution records in the table by year; let’s find the top years with the table my_table records based on the timestamp column:
select
(extract('year' from b)) the_year,
count(1) num_of_recs
from
my_table
group by 1
order by 2 desc, 1;
For example:
the_year | num_of_recs
----------+-------------
1977 | 205
2018 | 195
1929 | 191
1913 | 188
1935 | 185
1969 | 185
1997 | 185
1905 | 184
1964 | 184
1910 | 183
1966 | 183
...
Percentage of the total records by year in the table my_table based on the timestamp column:
select
to_char(b, 'YYYY') yyyy,
count(1) num_of_recs,
100 * round((count(1) / (sum(count(1)) over() )), 5) percentage
from
my_table
group by 1
order by 3 desc, 1;
For example:
yyyy | num_of_recs | percentage
------+-------------+------------
1977 | 205 | 1.01300
2018 | 195 | 0.96400
1929 | 191 | 0.94400
1913 | 188 | 0.92900
1935 | 185 | 0.91400
1969 | 185 | 0.91400
1997 | 185 | 0.91400
1905 | 184 | 0.90900
1964 | 184 | 0.90900
1910 | 183 | 0.90400
1966 | 183 | 0.90400
...
Distribution records in the table by (year-month); let’s find the top (year-month)-es with the table my_table records based on the timestamp column:
select
to_char(b, 'YYYY-MM') the_year_and_month,
count(1) num_of_recs
from
my_table
group by 1
order by 2 desc, 1;
For example:
the_year_and_month | num_of_recs
--------------------+-------------
1919-08 | 25
2000-10 | 25
2004-08 | 25
1917-06 | 24
1933-07 | 24
1949-06 | 24
1959-08 | 24
1977-03 | 24
1977-05 | 24
1980-08 | 24
1989-02 | 24
1997-03 | 24
1998-11 | 24
2010-05 | 24
2020-12 | 24
1901-10 | 23
...
Percentage of the total records by (year-month) in the table my_table based on the timestamp column:
select
to_char(b, 'YYYY-MM') yyyy_mm,
count(1) num_of_recs,
100 * round((count(1) / (sum(count(1)) over() )), 5) percentage
from
my_table
group by 1
order by 3 desc, 1;
For example:
yyyy_mm | num_of_recs | percentage
---------+-------------+------------
1919-08 | 25 | 0.12400
2000-10 | 25 | 0.12400
2004-08 | 25 | 0.12400
1917-06 | 24 | 0.11900
1933-07 | 24 | 0.11900
1949-06 | 24 | 0.11900
1959-08 | 24 | 0.11900
1977-03 | 24 | 0.11900
1977-05 | 24 | 0.11900
1980-08 | 24 | 0.11900
1989-02 | 24 | 0.11900
1997-03 | 24 | 0.11900
1998-11 | 24 | 0.11900
2010-05 | 24 | 0.11900
2020-12 | 24 | 0.11900
1901-10 | 23 | 0.11400
1903-09 | 23 | 0.11400
...
Distribution records in the table by (year-month-day); let’s find the top (year-month-day)s with the table my_table records based on the timestamp column:
select
to_char(b, 'YYYY-MM-DD') yyyy_mm_dd,
count(1) num_of_recs
from
my_table
group by 1
order by 2 desc, 1;
For example:
yyyy_mm_dd | num_of_recs
------------+-------------
1914-10-05 | 5
1929-01-09 | 5
1904-01-10 | 4
1906-04-06 | 4
1906-09-01 | 4
1907-02-26 | 4
1907-10-16 | 4
1911-06-02 | 4
1918-06-13 | 4
1924-07-30 | 4
1925-06-02 | 4
1926-03-31 | 4
1927-02-04 | 4
1928-08-17 | 4
1931-01-03 | 4
1931-04-07 | 4
1932-03-20 | 4
Conclusion
In this blog, I’ve demonstrated a few approaches how to check the distribution of timestamps in a PostgreSQL table.
ask_dima@yahoo.com
Posted on April 5, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.
Related
November 15, 2023