Skip to content

Window functions

Window functions allow you to calculate aggregates over portions of your result set. They are defined in SQL2003. Some databases use the alternative nomenclature analytic functions. They are expressed in SQL with the OVER clause. The Postgres documentation offers a good overview of window functions.

The withWindow_ function

When you want to add windows to a query, use the withWindow_ function to introduce your frames, and compute the projection. You may notice that this is a departure from SQL syntax, where you can define window expressions inline. Beam seeks to be type-safe. Queries with window functions follow slightly different rules. Wrapping such a query with a special function allows beam to enforce these rules.

For example, to get each invoice along with the average invoice total by each customer, use withWindow_ as follows.

withWindow_ (\i -> frame_ (partitionBy_ (invoiceCustomer i)) noOrder_ noBounds_)
            (\i w -> (i, avg_ (invoiceTotal i) `over_` w))
            (all_ (invoice chinookDb))
SELECT "t0"."InvoiceId" AS "res0",
       "t0"."CustomerId" AS "res1",
       "t0"."InvoiceDate" AS "res2",
       "t0"."BillingAddress" AS "res3",
       "t0"."BillingCity" AS "res4",
       "t0"."BillingState" AS "res5",
       "t0"."BillingCountry" AS "res6",
       "t0"."BillingPostalCode" AS "res7",
       "t0"."Total" AS "res8",
       AVG("t0"."Total") OVER (PARTITION BY "t0"."CustomerId") AS "res9"
FROM "Invoice" AS "t0"

Or to get each invoice along with the ranking of each invoice by total per customer and the overall ranking,

withWindow_ (\i -> ( frame_ noPartition_ (orderPartitionBy_ (asc_ (invoiceTotal i))) noBounds_
                   , frame_ (partitionBy_ (invoiceCustomer i)) (orderPartitionBy_ (asc_ (invoiceTotal i))) noBounds_ ))
            (\i (allInvoices, customerInvoices) -> (i, rank_ `over_` allInvoices, rank_ `over_` customerInvoices))
            (all_ (invoice chinookDb))
SELECT "t0"."InvoiceId" AS "res0",
       "t0"."CustomerId" AS "res1",
       "t0"."InvoiceDate" AS "res2",
       "t0"."BillingAddress" AS "res3",
       "t0"."BillingCity" AS "res4",
       "t0"."BillingState" AS "res5",
       "t0"."BillingCountry" AS "res6",
       "t0"."BillingPostalCode" AS "res7",
       "t0"."Total" AS "res8",
       RANK() OVER (
                    ORDER BY "t0"."Total" ASC) AS "res9",
                   RANK() OVER (PARTITION BY "t0"."CustomerId"
                                ORDER BY "t0"."Total" ASC) AS "res10"
FROM "Invoice" AS "t0"

Note

rank_ is only available in backends that implement the optional SQL2003 T611 feature "Elementary OLAP operations". Beam syntaxes that implement this functionality implement the IsSql2003ExpressionElementaryOLAPOperationsSyntax type class.

Notice that aggregates over the result of the window expression work as you'd expect. Beam automatically generates a subquery once a query has been windowed. For example, to get the sum of the totals of the invoices, by rank.

orderBy_ (\(rank, _) -> asc_ rank) $
aggregate_ (\(i, rank) -> (group_ rank, sum_ $ invoiceTotal i)) $
withWindow_ (\i -> frame_ (partitionBy_ (invoiceCustomer i)) (orderPartitionBy_ (asc_ (invoiceTotal i))) noBounds_)
            (\i w -> (i, rank_ `over_` w))
            (all_ (invoice chinookDb))
SELECT "t0"."res9" AS "res0",
       SUM("t0"."res8") AS "res1"
FROM
  (SELECT "t0"."InvoiceId" AS "res0",
          "t0"."CustomerId" AS "res1",
          "t0"."InvoiceDate" AS "res2",
          "t0"."BillingAddress" AS "res3",
          "t0"."BillingCity" AS "res4",
          "t0"."BillingState" AS "res5",
          "t0"."BillingCountry" AS "res6",
          "t0"."BillingPostalCode" AS "res7",
          "t0"."Total" AS "res8",
          RANK() OVER (PARTITION BY "t0"."CustomerId"
                       ORDER BY "t0"."Total" ASC) AS "res9"
   FROM "Invoice" AS "t0") AS "t0"
GROUP BY "t0"."res9"
ORDER BY "t0"."res9" ASC

Frame syntax

The frame_ function takes a partition, ordering, and bounds parameter, all of which are optional. To specify no partition, use noPartition_. For no ordering, use noOrder_. For no bounds, use noBounds_.

To specify a partition, use partitionBy_ with an expression or a tuple of expressions. To specify an ordering use orderPartitionBy_ with an ordering expression or a tuple of ordering expressions. Ordering expressions are scalar expressions passed to either asc_ or desc_. Finally, to specify bounds, use bounds_ or fromBound_. fromBound_ starts the window at the specified position, which can be unbounded_ (the default) to include all rows seen thus far. bounds_ lets you specify an optional ending bound, which can be Nothing (the default), Just unbounded_ (the semantic default, but producing an explicit bound syntactically), or Just (nrows_ x), where x is an integer expression, specifying the number of rows before or after to include in the calculation.

The following query illustrates some of these features. Along with each invoice, it returns

  • The average total of all invoices, given by the frame with no partition, ordering, and bounds.
  • The average total of all invoices, by customer.
  • The rank of each invoice over all the rows, when ordered by total.
  • The average of the totals of the invoices starting at the two immediately preceding and ending with the two immediately succeeding invoices, when ordered by date.

withWindow_ (\i -> ( frame_ noPartition_ noOrder_ noBounds_
                   , frame_ (partitionBy_ (invoiceCustomer i)) noOrder_ noBounds_
                   , frame_ noPartition_ (orderPartitionBy_ (asc_ (invoiceTotal i))) noBounds_
                   , frame_ noPartition_ (orderPartitionBy_ (asc_ (invoiceDate i))) (bounds_ (nrows_ 2) (Just (nrows_ 2)))))
            (\i (allRows_, sameCustomer_, totals_, fourInvoicesAround_) ->
                 ( i
                 , avg_ (invoiceTotal i) `over_` allRows_
                 , avg_ (invoiceTotal i) `over_` sameCustomer_
                 , rank_ `over_` totals_
                 , avg_ (invoiceTotal i) `over_` fourInvoicesAround_ ))
            (all_ (invoice chinookDb))
SELECT "t0"."InvoiceId" AS "res0",
       "t0"."CustomerId" AS "res1",
       "t0"."InvoiceDate" AS "res2",
       "t0"."BillingAddress" AS "res3",
       "t0"."BillingCity" AS "res4",
       "t0"."BillingState" AS "res5",
       "t0"."BillingCountry" AS "res6",
       "t0"."BillingPostalCode" AS "res7",
       "t0"."Total" AS "res8",
       AVG("t0"."Total") OVER () AS "res9",
                              AVG("t0"."Total") OVER (PARTITION BY "t0"."CustomerId") AS "res10",
                                                     RANK() OVER (
                                                                  ORDER BY "t0"."Total" ASC) AS "res11",
                                                                 AVG("t0"."Total") OVER (
                                                                                         ORDER BY "t0"."InvoiceDate" ASC ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING) AS "res12"
FROM "Invoice" AS "t0"