
- Apache Drill Tutorial
- Apache Drill - Home
- Apache Drill - Introduction
- Apache Drill - Fundamentals
- Apache Drill - Architecture
- Apache Drill - Installation
- Apache Drill - SQL Operations
- Apache Drill - Query using JSON
- Window Functions using JSON
- Querying Complex Data
- Data Definition Statements
- Apache Drill - Querying Data
- Querying Data using HBase
- Querying Data using Hive
- Apache Drill - Querying Parquet Files
- Apache Drill - JDBC Interface
- Apache Drill - Custom Function
- Apache Drill - Contributors
- Apache Drill Useful Resources
- Apache Drill - Quick Guide
- Apache Drill - Useful Resources
- Apache Drill - Discussion
Window Functions using JSON
Window functions execute on a set of rows and return a single value for each row from the query. The term window has the meaning of the set of rows for the function.
A Window function in a query, defines the window using the OVER() clause. This OVER() clause has the following capabilities −
- Defines window partitions to form groups of rows. (PARTITION BY clause)
- Orders rows within a partition. (ORDER BY clause)
Aggregate Window Functions
The Aggregate window function can be defined over a partition by and order by clause.
Avg()
The following program shows the query for this function −
Query
0: jdbc:drill:zk = local> select mark1,gender, avg(mark1) over (partition by gender ) as avgmark1 from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
Result
mark1 gender avgmark1 70 female 83.0 100 female 83.0 90 female 83.0 75 female 83.0 80 female 83.0 70 male 55.0 40 male 55.0 60 male 55.0 50 male 55.0 55 male 55.0
This result shows that partition by clause is used for the gender column. So, it takes the average of mark1 from female gender which is 83.0 and then replaces that value to all the male and female gender. The mark1 avg result is now 55.0 and hence it replaces the same to all genders.
Count(*)
The following program shows the query for this function −
Query
0: jdbc:drill:zk = local> select name, gender, mark1, age, count(*) over(partition by age) as cnt from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
Result
name gender mark1 age cnt Adam male 70 12 7 Amit male 40 12 7 Bob male 60 12 7 David male 50 12 7 Esha female 70 12 7 Ganga female 100 12 7 Leena female 90 12 7 Jack male 55 13 3 Mary female 75 13 3 Peter female 80 13 3
Here, there are two age groups 12 and 13. The age count of 12 is for 7 students and 13 age count is for 3 students. Hence count(*) over partition by age replaces 7 for 12 age group and 3 for 13 age group.
MAX()
The following program shows the query for this function −
Query
0: jdbc:drill:zk = local> select name,age,gender,mark3, max(mark3) over (partition by gender) as maximum from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
Result
name age gender mark3 maximum Esha 12 female 65 98 Ganga 12 female 98 98 Leena 12 female 95 98 Mary 13 female 90 98 Peter 13 female 88 98 Adam 12 male 60 70 Amit 12 male 40 70 Bob 12 male 70 70 David 12 male 70 70 Jack 13 male 45 70
From the above query, maximum mark3 is partitioned by gender, hence female gender max mark 98 is replaced to all female students and male gender max mark 70 is replaced to all male students.
MIN()
The following program shows the query for this function −
Query
0: jdbc:drill:zk = local> select mark2,min(mark2) over (partition by age ) as minimum from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
Result
mark2 minimum 50 50 50 50 80 50 70 50 60 50 95 50 85 50 45 45 85 45 85 45
SUM()
The following program shows the query for this function −
Query
0: jdbc:drill:zk = local> select name,age,sum(mark1+mark2) over (order by age ) as summation from dfs.`/Users/../workspace/Drill-samples/student_list.json`;
Result
name age summation Adam 12 970 Amit 12 970 Bob 12 970 David 12 970 Esha 12 970 Ganga 12 970 Leena 12 970 Jack 13 1395 Mary 13 1395 Peter 13 1395
Here mark1+mark2 result is replaced separately to each male and female student.
Ranking Window Functions
Following is the table listed out with ranking window functions.
Function | Return Type | Description |
---|---|---|
DOUBLE | Calculates the relative rank of the current row within a window partition (number of rows preceding or peer with current row) / (total rows in the window partition) | |
BIGINT | Rank of a value in a group of values based on the ORDER BY expression and the OVER clause. Each value is ranked within its partition. Rows with equal values receive the same rank. If two or more rows have the same rank then no gaps in the rows. | |
INTEGER | The NTILE window function divides the rows for each window partition, as equally as possible, into a specified number of ranked groups. | |
DOUBLE | Calculates the percent rank of the current row using the following formula: (x - 1) / (number of rows in window partition - 1) where x is the rank of the current row. | |
BIGINT | The RANK window function determines the rank of a value in a group of values. For example, if two rows are ranked 1, the next rank is 3. | |
BIGINT | Gives unique row numbers for the rows in a group. |