Using CTEs to generate ranges and/or fill holes
Common Table Expressions are a tool to express things in SQL without need to write long queries or use procedural SQL. The ability to write recursive queries is even nicer and yesterday I had an idea how to use it little bit differently.
Often I’m generating data for reports and I’m trying to do as much as I can on database server. One common problem I’m facing with these reports is the need to include some “zero” data even for, in this case, days, where no data were available.
Imagine, you’re doing something like this (Firebird syntax).
select data.d, sum(data.i) from ( select current_date as d, 1 as i from rdb$database union all select current_date, 1 from rdb$database union all select current_date+1, 2 from rdb$database union all select current_date+3, 5 from rdb$database union all select current_date+3, 2 from rdb$database ) data group by data.d
It’s OK, but you’ll get data only for date that are present in database. But because you’re generating report for whole month, as I did, it would be nice to have there the data too even with “zeros”. Because then the logic on application side is much simpler, you just need to deal how to present the report’s data.
If you have CTEs, it’s pretty easy to use the recursiveness and generate the sequence, join it with original data and do what you have to do.
select range.d, coalesce(sum(data.i), 0) from ( select current_date as d, 1 as i from rdb$database union all select current_date, 1 from rdb$database union all select current_date+1, 2 from rdb$database union all select current_date+3, 5 from rdb$database union all select current_date+3, 2 from rdb$database ) data right outer join ( with recursive padding as ( select current_date as d from rdb$database union all select dateadd(day, 1, d) from padding where d < dateadd(month, 1, current_date) ) select * from padding ) range on (data.d = range.d) group by range.d
I'm using here right outer join just to keep the original data above the sequence I'm generating. But you can swap the tables and use left outer join as it may be easier to read and maintain.
The code is pretty straightforward and if you used CTEs ever before, you'll be able to read. I'm simply generating the dates until I'm ready with the sequence (in this case one month) and then joining with original data. The outer join will ensure the holes are filled and nulls are there. These nulls are later processed with coalesce to "zero" values.
If you want to directly control the number of elements in sequence, simply use some counter.
with recursive padding as ( select current_date as d, 0 as cnt from rdb$database union all select dateadd(day, 1, d), cnt+1 from padding where cnt < 10 ) select * from padding



There's 2 Comments So Far
October 6th, 2010 at 12:14
Nice tip Jiri, thanks for this practical example on how to use CTE. I regret that you choosed to generate “useful” data by querying RDB$DATABASE instead of creating a table and filling it with some sample data. That makes the example a bit hard to understand.
October 6th, 2010 at 13:56
Yes, simple table would be probably easier to read, on the other hand with this example you’re ready to go without doing any cleanup after playing. I hope readers will get the idea behind.
Share your thoughts, leave a comment!