Oracle PLSQL Tutorial Point: September 2012

Saturday, 22 September 2012

SAVE EXCEPTIONS

Since version 9 it is possible to do bulk DML using FORALL and use the SAVE EXCEPTIONS clause. It makes sure that all invalid rows - the exceptions - are saved into the sql%bulk_exceptions array, while all valid rows are still processed. This array stores a record for each invalid row, containing an ERROR_INDEX which is the iteration number during the FORALL statement and an ERROR_CODE which corresponds with the Oracle error code.

The error text is not stored. The documentation says:

The individual error messages, or any substitution arguments, are not saved, but the error message text can looked up using ERROR_CODE with SQLERRM ...

Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');

At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:

rwijk@ORA11G> create table mytable
  2  ( id number(4)
  3  , name varchar2(30)
  4  )
  5  /

Tabel is aangemaakt.

rwijk@ORA11G> create trigger mytable_bri
  2  before insert on mytable
  3  for each row
  4  begin
  5    if :new.id = 2
  6    then
  7      raise_application_error(-20000,'APP-12345');
  8    elsif :new.id = 9
  9    then
 10      raise_application_error(-20000,'APP-98765');
 11    end if;
 12  end;
 13  /

Trigger is aangemaakt.

rwijk@ORA11G> alter table mytable add constraint mytable_ck1 check (id <> 6)
  2  /

Tabel is gewijzigd.

rwijk@ORA11G> declare
  2    e_forall_error exception;
  3    pragma exception_init(e_forall_error,-24381)
  4    ;
  5    type t_numbers is table of mytable.id%type;
  6    l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
  7    ;
  8  begin
  9    forall i in 1..l_numbers.count save exceptions
 10      insert into mytable
 11      ( id
 12      , name
 13      )
 14      values
 15      ( l_numbers(i)
 16      , 'Name' || to_char(l_numbers(i))
 17      )
 18    ;
 19  exception
 20  when e_forall_error then
 21    for i in 1..sql%bulk_exceptions.count
 22    loop
 23      dbms_output.put_line('SQLCODE: ' || sql%bulk_exceptions(i).error_code);
 24      dbms_output.put_line('SQLERRM: ' || sqlerrm(-sql%bulk_exceptions(i).error_code));
 25      dbms_output.new_line;
 26    end loop;
 27  end;
 28  /
SQLCODE: 20000
SQLERRM: ORA-20000:

SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (.) is geschonden.

SQLCODE: 20000
SQLERRM: ORA-20000:


PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
  2  /

        ID NAME
---------- ------------------------------
         1 Name1
         3 Name3
         4 Name4
         5 Name5
         7 Name7
         8 Name8
        10 Name10

7 rijen zijn geselecteerd.

Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.

However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:

rwijk@ORA11G> rollback
  2  /

Rollback is voltooid.

rwijk@ORA11G> exec dbms_errlog.create_error_log('mytable')

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> declare
  2    type t_numbers is table of mytable.id%type;
  3    l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
  4    ;
  5  begin
  6    forall i in 1..l_numbers.count
  7      insert into mytable
  8      ( id
  9      , name
 10      )
 11      values
 12      ( l_numbers(i)
 13      , 'Name' || to_char(l_numbers(i))
 14      )
 15      log errors reject limit unlimited
 16    ;
 17    for r in
 18    ( select ora_err_number$
 19           , ora_err_mesg$
 20        from err$_mytable
 21    )
 22    loop
 23      dbms_output.put_line('SQLCODE: ' || to_char(r.ora_err_number$));
 24      dbms_output.put_line('SQLERRM: ' || r.ora_err_mesg$);
 25      dbms_output.new_line;
 26    end loop
 27    ;
 28  end;
 29  /
SQLCODE: 20000
SQLERRM: ORA-20000: APP-12345
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 4
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.


SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (RWIJK.MYTABLE_CK1) is geschonden.


SQLCODE: 20000
SQLERRM: ORA-20000: APP-98765
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 7
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.



PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
  2  /

        ID NAME
---------- ------------------------------
         1 Name1
         3 Name3
         4 Name4
         5 Name5
         7 Name7
         8 Name8
        10 Name10

7 rijen zijn geselecteerd.

And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...

UPDATE

It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):

Restrictions on DML Error Logging

* The following conditions cause the statement to fail and roll back without invoking the error logging capability:

o Violated deferred constraints.

o Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation.

o Any update operation UPDATE or MERGE that raises a unique constraint or index violation.

* You cannot track errors in the error logging table for LONG, LOB, or object type columns. However, the table that is the target of the DML operation can contain these types of columns.

o If you create or modify the corresponding error logging table so that it contains a column of an unsupported type, and if the name of that column corresponds to an unsupported column in the target DML table, then the DML statement fails at parse time.

o If the error logging table does not contain any unsupported column types, then all DML errors are logged until the reject limit of errors is reached. For rows on which errors occur, column values with corresponding columns in the error logging table are logged along with the control information.

Tuesday, 18 September 2012

SQL Queries

Create the following Tables:

LOCATION
Location_ID	Regional_Group
122	NEW YORK
123	DALLAS
124	CHICAGO
167	BOSTON

DEPARTMENT
Department_ID	Name	Location_ID
10	ACCOUNTING	122
20	RESEARCH	124
30	SALES	123
40	OPERATIONS	167

JOB
Job_ID	Function
667	CLERK
668	STAFF
669	ANALYST
670	SALESPERSON
671	MANAGER
672	PRESIDENT

EMPLOYEE
EMPLOYEE_ID	LAST_NAME	FIRST_NAME	MIDDLE_NAME	JOB_ID	MANAGER_ID	HIREDATE	SALARY	COMM	DEPARTMENT_ID
7369	SMITH	JOHN	Q	667	7902	17-DEC-84	800	NULL	20
7499	ALLEN	KEVIN	J	670	7698	20-FEB-85	1600	300	30
7505	DOYLE	JEAN	K	671	7839	04-APR-85	2850	NULL	30
7506	DENNIS	LYNN	S	671	7839	15-MAY-85	2750	NULL	30
7507	BAKER	LESLIE	D	671	7839	10-JUN-85	2200	NULL	40
7521	WARK	CYNTHIA	D	670	7698	22-FEB-85	1250	500	30

Queries based on the above tables:

Simple Queries:

List all the employee details
List all the department details
List all job details
List all the locations
List out first name,last name,salary, commission for all employees
List out employee_id,last name,department id for all employees and rename employee id as "ID of the employee", last name as "Name of the employee", department id as "department ID"
List out the employees anuual salary with their names only.

Where Conditions:

List the details about "SMITH"
List out the employees who are working in department 20
List out the employees who are earning salary between 3000 and 4500
List out the employees who are working in department 10 or 20
Find out the employees who are not working in department 10 or 30
List out the employees whose name starts with "S"
List out the employees whose name start with "S" and end with "H"
List out the employees whose name length is 4 and start with "S"
List out the employees who are working in department 10 and draw the salaries more than 3500
list out the employees who are not receiving commission.

Order By Clause:

List out the employee id, last name in ascending order based on the employee id.
List out the employee id, name in descending order based on salary column
list out the employee details according to their last_name in ascending order and salaries in descending order
list out the employee details according to their last_name in ascending order and then on department_id in descending order.

Group By & Having Clause:

How many employees who are working in different departments wise in the organization
List out the department wise maximum salary, minimum salary, average salary of the employees
List out the job wise maximum salary, minimum salary, average salaries of the employees.
List out the no.of employees joined in every month in ascending order.
List out the no.of employees for each month and year, in the ascending order based on the year, month.
List out the department id having atleast four employees.
How many employees in January month.
How many employees who are joined in January or September month.
How many employees who are joined in 1985.
How many employees joined each month in 1985.
How many employees who are joined in March 1985.
Which is the department id, having greater than or equal to 3 employees joined in April 1985.

Sub-Queries

Display the employee who got the maximum salary.
Display the employees who are working in Sales department
Display the employees who are working as "Clerk".
Display the employees who are working in "New York"
Find out no.of employees working in "Sales" department.
Update the employees salaries, who are working as Clerk on the basis of 10%.
Delete the employees who are working in accounting department.
Display the second highest salary drawing employee details.
Display the Nth highest salary drawing employee details

Sub-Query operators: (ALL,ANY,SOME,EXISTS)

List out the employees who earn more than every employee in department 30.
List out the employees who earn more than the lowest salary in department 30.
Find out whose department has not employees.
Find out which department does not have any employees.

Co-Related Sub Queries:

47.Find out the employees who earn greater than the average salary for their department.

Joins

Simple join

48.List our employees with their department names

49.Display employees with their designations (jobs)

50.Display the employees with their department name and regional groups.

51.How many employees who are working in different departments and display with department name.

52.How many employees who are working in sales department.

53.Which is the department having greater than or equal to 5 employees and display the department names in ascending order.

54.How many jobs in the organization with designations.

55.How many employees working in "New York".

Non – Equi Join:

56.Display employee details with salary grades.

57.List out the no. of employees on grade wise.

58.Display the employ salary grades and no. of employees between 2000 to 5000 range of salary.

Self Join:

59.Display the employee details with their manager names.

60.Display the employee details who earn more than their managers salaries.

61.Show the no. of employees working under every manager.

Outer Join:

61.Display employee details with all departments.

62.Display all employees in sales or operation departments.

Set Operators:

63.List out the distinct jobs in Sales and Accounting Departments.

64.List out the ALL jobs in Sales and Accounting Departments.

65.List out the common jobs in Research and Accounting Departments in ascending order.

Answers

SQL > Select * from employee;
SQL > Select * from department;
SQL > Select * from job;
SQL > Select * from loc;
SQL > Select first_name, last_name, salary, commission from employee;
SQL > Select employee_id "id of the employee", last_name "name", department id as "department id" from employee;
SQL > Select last_name, salary*12 "annual salary" from employee
SQL > Select * from employee where last_name='SMITH';
SQL > Select * from employee where department_id=20
SQL > Select * from employee where salary between 3000 and 4500
SQL > Select * from employee where department_id in (20,30)
SQL > Select last_name, salary, commission, department_id from employee where department_id not in (10,30)
SQL > Select * from employee where last_name like 'S%'
SQL > Select * from employee where last_name like 'S%H'
SQL > Select * from employee where last_name like 'S___'
SQL > Select * from employee where department_id=10 and salary>3500
SQL > Select * from employee where commission is Null
SQL > Select employee_id, last_name from employee order by employee_id
SQL > Select employee_id, last_name, salary from employee order by salary desc
SQL > Select employee_id, last_name, salary from employee order by last_name, salary desc
SQL > Select employee_id, last_name, salary from employee order by last_name, department_id desc
SQL > Select department_id, count(*), from employee group by department_id
SQL > Select department_id, count(*), max(salary), min(salary), avg(salary) from employee group by department_id
SQL > Select job_id, count(*), max(salary), min(salary), avg(salary) from employee group by job_id
SQL > Select to_char(hire_date,'month')month, count(*) from employee group by to_char(hire_date,'month') order by month
SQL > Select to_char(hire_date,'yyyy') Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee group by to_char(hire_date,'yyyy'), to_char(hire_date,'mon')
SQL > Select department_id, count(*) from employee group by department_id having count(*)>=4
SQL > Select to_char(hire_date,'mon') month, count(*) from employee group by to_char(hire_date,'mon') having to_char(hire_date,'mon')='jan'
SQL > Select to_char(hire_date,'mon') month, count(*) from employee group by to_char(hire_date,'mon') having to_char(hire_date,'mon') in ('jan','sep')
SQL > Select to_char(hire_date,'yyyy') Year, count(*) from employee group by to_char(hire_date,'yyyy') having to_char(hire_date,'yyyy')=1985
SQL > Select to_char(hire_date,'yyyy')Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 group by to_char(hire_date,'yyyy'),to_char(hire_date,'mon')
SQL > Select to_char(hire_date,'yyyy')Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 and to_char(hire_date,'mon')='mar' group by to_char(hire_date,'yyyy'),to_char(hire_date,'mon')
SQL > Select department_id, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 and to_char(hire_date,'mon')='apr' group by to_char(hire_date,'yyyy'), to_char(hire_date,'mon'), department_id having count(*)>=3
SQL > Select * from employee where salary=(select max(salary) from employee)
SQL > Select * from employee where department_id IN (select department_id from department where name='SALES')
SQL > Select * from employee where job_id in (select job_id from job where function='CLERK'
SQL > Select * from employee where department_id=(select department_id from department where location_id=(select location_id from location where regional_group='New York'))
SQL > Select * from employee where department_id=(select department_id from department where name='SALES' group by department_id)
SQL > Update employee set salary=salary*10/100 wehre job_id=(select job_id from job where function='CLERK')
SQL > delete from employee where department_id=(select department_id from department where name='ACCOUNTING')
SQL > Select * from employee where salary=(select max(salary) from employee where salary <(select max(salary) from employee))
SQL > Select distinct e.salary from employee where & no-1=(select count(distinct salary) from employee where sal>e.salary)
SQL > Select * from employee where salary > all (Select salary from employee where department_id=30)
SQL > Select * from employee where salary > any (Select salary from employee where department_id=30)
SQL > Select employee_id, last_name, department_id from employee e where not exists (select department_id from department d where d.department_id=e.department_id)
SQL > Select name from department d where not exists (select last_name from employee e where d.department_id=e.department_id)
SQL > Select employee_id, last_name, salary, department_id from employee e where salary > (select avg(salary) from employee where department_id=e.department_id)
SQL > Select employee_id, last_name, name from employee e, department d where e.department_id=d.department_id
SQL > Select employee_id, last_name, function from employee e, job j where e.job_id=j.job_id
SQL > Select employee_id, last_name, name, regional_group from employee e, department d, location l where e.department_id=d.department_id and d.location_id=l.location_id
SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name
SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name having name='SALES'
SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name having count (*)>=5 order by name
SQL > Select function, count(*) from employee e, job j where j.job_id=e.job_id group by function
SQL > Select regional_group, count(*) from employee e, department d, location l where e.department_id=d.department_id and d.location_id=l.location_id and regional_group='NEW YORK' group by regional_group
SQL > Select employee_id, last_name, grade_id from employee e, salary_grade s where salary between lower_bound and upper_bound order by last_name
SQL > Select grade_id, count(*) from employee e, salary_grade s where salary between lower_bound and upper_bound group by grade_id order by grade_id desc
SQL > Select grade_id, count(*) from employee e, salary_grade s where salary between lower_bound and upper_bound and lower_bound>=2000 and lower_bound<=5000 group by grade_id order by grade_id desc
SQL > Select e.last_name emp_name, m.last_name, mgr_name from employee e, employee m where e.manager_id=m.employee_id
SQL > Select e.last_name emp_name, e.salary emp_salary, m.last_name, mgr_name, m.salary mgr_salary from employee e, employee m where e.manager_id=m.employee_id and m.salary
SQL > Select m.manager_id, count(*) from employee e, employee m where e.employee_id=m.manager_id group by m.manager_id
SQL > Select last_name, d.department_id, d.name from employee e, department d where e.department_id(+)=d.department_id
SQL > Select last_name, d.department_id, d.name from employee e, department d where e.department_id(+)=d.department_id and d.department_idin (select department_id from department where name IN ('SALES','OPERATIONS'))
SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='SALES')) union Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING'))
SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='SALES')) union all Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING'))
SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='RESEARCH')) intersect Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING')) order by function

sql-and-plsql-interview-questions

Keep clause

You may have seen an aggregate function like this in SQL queries:

max(value) keep (dense_rank first order by mydate)

or this analytic variant:

max(value) keep (dense_rank last order by mydate) over (partition by relation_nr)

Unfortunately, when you start searching for the "keep" clause, you won't find anything in the Oracle documentation (and hopefully because of this blogpost, people will now have a reference). Of course Oracle documents such functions. You only have to know that they are called FIRST and LAST in the SQL Language Reference.

Even though these functions were already introduced in version 9, I've seen lots of code that could have used these functions, but didn't. And that's a pity because it's a wasted opportunity to write shorter and faster code. The common use case I'm talking about is when you have a detail table with a validity period. Typically with a column startdate, and optionally an enddate. For such a table, you often have to know the values of the currently valid row. An example: suppose we have a table RELATIONS and for each relation we want to know his address at a certain point in time:

01SQL> create table relations

02  2  ( id   number       not null primary key

03  3  , name varchar2(30) not null

04  4  )

05  5  /

06 

07Table created.

08 

09SQL> insert into relations

10  2  select 1, 'Oracle Nederland' from dual union all

11  3  select 2, 'Ciber Nederland' from dual

12  4  /

13 

142 rows created.

15 

16SQL> create table relation_addresses

17  2  ( relation_id number       not null

18  3  , startdate   date         not null

19  4  , address     varchar2(30) not null

20  5  , postal_code varchar2(6)  not null

21  6  , city        varchar2(30) not null

22  7  , constraint ra_pk primary key (relation_id,startdate)

23  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)

24  9  )

25 10  /

26 

27Table created.

28 

29SQL> insert into relation_addresses

30  2  select 1, date '1995-01-01', 'Rijnzathe 6', '3454PV', 'De Meern' from dual union all

31  3  select 1, date '2011-01-01', 'Hertogswetering 163-167', '3543AS', 'Utrecht' from dual union all

32  4  select 2, date '2000-01-01', 'Frankrijkstraat 128', '5622AH', 'Eindhoven' from dual union all

33  5  select 2, date '2006-01-01', 'Meerkollaan 15', '5613BS', 'Eindhoven' from dual union all

34  6  select 2, date '2010-01-01', 'Burgemeester Burgerslaan 40b', '5245NH', 'Den Bosch' from dual union all

35  7  select 2, date '2015-01-01', 'Archimedesbaan 16', '3439ME', 'Nieuwegein' from dual

36  8  /

37 

386 rows created.

39 

40SQL> begin

41  2    dbms_stats.gather_table_stats(user,'relations');

42  3    dbms_stats.gather_table_stats(user,'relation_addresses');

43  4  end;

44  5  /

45 

46PL/SQL procedure successfully completed.

Relation "Oracle Nederland" has two addresses, and its current address being at the Hertogswetering. And fictively, relation "Ciber Nederland" has four addresses. The current address is the Den Bosch one. And I've also recorded a future address in Nieuwegein. Note that, in real life, the latter three are all Ciber offices currently in use. To get the active relation addresses on October 1st, 2012, I can use this query:

01SQL> var REFERENCE_DATE varchar2(10)

02SQL> exec :REFERENCE_DATE:='2012-10-01'

03 

04PL/SQL procedure successfully completed.

05 

06SQL> select ra.relation_id

07  2       , max(ra.startdate) startdate

08  3    from relation_addresses ra

09  4   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

10  5   group by ra.relation_id

11  6  /

12 

13RELATION_ID STARTDATE

14----------- -------------------

15          1 01-01-2011 00:00:00

16          2 01-01-2010 00:00:00

17 

182 rows selected.

But what if I want to retrieve the current address belonging to these rows? In fact, this is frequently being asked in Oracle forums. Prior to Oracle8, you would have used a query like below:

01SQL> select ra.relation_id

02  2       , ra.startdate

03  3       , ra.address

04  4       , ra.postal_code

05  5       , ra.city

06  6    from relation_addresses ra

07  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

08  8     and not exists

09  9         ( select 'a relation_address with a more recent startdate'

10 10             from relation_addresses ra2

11 11            where ra2.relation_id = ra.relation_id

12 12              and ra2.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

13 13              and ra2.startdate > ra.startdate

14 14         )

15 15  /

16 

17RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY

18----------- ------------------- ------------------------------ ------ ------------------------------

19          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht

20          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

21 

222 rows selected.

This uses a correlated subquery accessing the table (or index belonging to) table RELATION_ADDRESSES twice. Which can be prevented from Oracle8 onwards by using an analytic function:

01SQL> select relation_id

02  2       , startdate

03  3       , address

04  4       , postal_code

05  5       , city

06  6    from ( select ra.relation_id

07  7                , ra.startdate

08  8                , ra.address

09  9                , ra.postal_code

10 10                , ra.city

11 11                , row_number() over (partition by ra.relation_id order by ra.startdate desc) rn

12 12             from relation_addresses ra

13 13            where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

14 14         )

15 15   where rn = 1

16 16  /

17 

18RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY

19----------- ------------------- ------------------------------ ------ ------------------------------

20          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht

21          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

22 

232 rows selected.

Here you compute the row_number when you partition the result set per relation_id ordered by startdate in descending order. Meaning the most recent date starting before the reference date, gets row_number 1 assigned per relation_id. By using an inline view, we can filter on the outcome of the analytic function, and only select the rows with row_number 1. In forums, you'll see this solution often being adviced. Compared to the correlated subquery, this query selects only once from table RELATION_ADDRESSES. However, you can do even better by just adding three "keep clause" functions to the original query:

01SQL> select ra.relation_id

02  2       , max(ra.startdate) startdate

03  3       , max(ra.address) keep (dense_rank last order by ra.startdate) address

04  4       , max(ra.postal_code) keep (dense_rank last order by ra.startdate) postal_code

05  5       , max(ra.city) keep (dense_rank last order by ra.startdate) city

06  6    from relation_addresses ra

07  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

08  8   group by ra.relation_id

09  9  /

10 

11RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY

12----------- ------------------- ------------------------------ ------ ------------------------------

13          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht

14          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch

15 

162 rows selected.

The three extra aggregate functions all do a "dense_rank last order by startdate", meaning "sort the rows by startdate, and pick only those rows which have the most recent startdate". If you have more rows with the same startdate, the max function at the start tells Oracle to pick the value with the maximum address/postal_code/city. However, (relation_id,startdate) is unique, so ties are impossible and thus the max function is a dummy. I also could have used min.

The query is shorter and -to me- clearer at first glance. However, the main reason for my enthusiasm for the aggregate functions FIRST and LAST is because it's just faster. To show this, let's execute those queries against a table with 300,000 rows, 100,000 relations with 3 addresses each:

01SQL> create table relations

02  2  ( id   number       not null primary key

03  3  , name varchar2(30) not null

04  4  )

05  5  /

06 

07Table created.

08 

09SQL> create table relation_addresses

10  2  ( relation_id number       not null

11  3  , startdate   date         not null

12  4  , address     varchar2(30) not null

13  5  , postal_code varchar2(6)  not null

14  6  , city        varchar2(30) not null

15  7  , constraint ra_pk primary key (relation_id,startdate)

16  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)

17  9  )

18 10  /

19 

20Table created.

21 

22SQL> insert into relations

23  2   select level

24  3        , dbms_random.string('a',30)

25  4     from dual

26  5  connect by level <= 100000

27  6  /

28 

29100000 rows created.

30 

31SQL> insert into relation_addresses

32  2   select 1 + mod(level-1,100000)

33  3        , date '2013-01-01' - numtodsinterval(level,'hour')

34  4        , dbms_random.string('a',30)

35  5        , dbms_random.string('a',6)

36  6        , dbms_random.string('a',30)

37  7     from dual

38  8  connect by level <= 300000

39  9  /

40 

41300000 rows created.

42 

43SQL> begin

44  2    dbms_stats.gather_table_stats

45  3    ( user

46  4    , 'relations'

47  5    , cascade=>true

48  6    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'

49  7    , estimate_percent=>100

50  8    );

51  9    dbms_stats.gather_table_stats

52 10    ( user

53 11    , 'relation_addresses'

54 12    , cascade=>true

55 13    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'

56 14    , estimate_percent=>100

57 15    );

58 16  end;

59 17  /

60 

61PL/SQL procedure successfully completed.

Note that I created histograms with 254 buckets just to make the optimizer see that it should full scan the table, despite the "startdate <= :REFERENCE_DATE" predicate. This next query should give a clue what's in the table:

01SQL> select *

02  2    from relation_addresses

03  3   where relation_id in (1,2,99999,100000)

04  4   order by relation_id

05  5       , startdate

06  6  /

07 

08RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY

09----------- ------------------- ------------------------------ ------ ------------------------------

10          1 09-03-1990 15:00:00 tKgXePxuAIdhFBNJLIRRjodrlJzGOl vPIAbL pNkbFHTJPrVuDIYLxsCfUfetBsKJIE

11          1 05-08-2001 07:00:00 LybVzfpzoQzXjpCAdkSZrkYrwUtZtL cWJwFe IczTRyjITWCJIOErccfITVvsqRVyMF

12          1 31-12-2012 23:00:00 lNEwsdYhbwdqRxHTSCTCykgICxiXKL oXzHQF YfyKFmiboCWfmNLjVLZoKmUDoMFaDu

13          2 09-03-1990 14:00:00 svOylQPkbyfympSXRMeyudfFErFvlO MLFdpG LTtAKdrpUmCwFgqEmoKxnUtWecwgcV

14          2 05-08-2001 06:00:00 BsRCUviBiLHaAEjyRVnIedRAWzuVSe DlBlZW ErQmCkDgNDTMOdZzceFYrMXnZmmjxg

15          2 31-12-2012 22:00:00 wqdFdXoBdmmCooLtGfWOMKukIMrDlI geRRHz DaPpWHOOdWgbjLaRkxfFDUIPgVgvEt

16      99999 12-10-1978 01:00:00 FsXOjUdNIgjjGjnWpJjTTscbcuqsxa PdhVtm qOskmLwRlngSEihmlpYhmNHhvtrpBc

17      99999 09-03-1990 17:00:00 sqoKYNeDntZtAUSmSDMtIQZloTSVeD uGPszi GIDctptEomcGzYGYhUGhKHgDRZJCmY

18      99999 05-08-2001 09:00:00 fhHGwuGPIHSOaKdjDvDcqTzsbHZzqR tpaLAP rVYCmijzqJmhlnZZLXkHpgFmLAEiTS

19     100000 12-10-1978 00:00:00 WwxfHcVfkFfItgcXfjPnKTiATlHjao nSOjSn vZNRsRySNPlmQKgCJjcpiEOhQIxzoy

20     100000 09-03-1990 16:00:00 cGcVPMsFyxCBrnsZtMYBnaAflXiNff NVKRIr SseFWkWyUDgaPpbxdmENdLjurGbJPK

21     100000 05-08-2001 08:00:00 dRfCmqdmbhcmaMvyYBpewPsFBCVdlG BMQWLY YPaAGnKKUkfdnAeAyLYeUBfXwezsEo

22 

2312 rows selected.

So there are a couple of rows that are filtered because they're in the future, but for most rows, the latest row is the current one. This is the plan of the first query with the correlated subquery:

01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))

02  2  /

03 

04PLAN_TABLE_OUTPUT

05---------------------------------------------------------------------------------------------------------------------------------------

06SQL_ID  d6p5uh67h65yb, child number 0

07-------------------------------------

08select ra.relation_id      , ra.startdate      , ra.address      ,

09ra.postal_code      , ra.city   from relation_addresses ra  where

10ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')    and not exists

11       ( select 'a relation_address with a more recent startdate'

12     from relation_addresses ra2           where ra2.relation_id =

13ra.relation_id             and ra2.startdate <=

14to_date(:REFERENCE_DATE,'yyyy-mm-dd')             and ra2.startdate >

15ra.startdate        )

16 

17Plan hash value: 3749094337

18 

19---------------------------------------------------------------------------------------------------------------

20| Id  | Operation             | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

21---------------------------------------------------------------------------------------------------------------

22|   0 | SELECT STATEMENT      |                    |      1 |        |    100K|00:00:00.66 |   15071 |   3681 |

23|*  1 |  HASH JOIN RIGHT ANTI |                    |      1 |   2978 |    100K|00:00:00.66 |   15071 |   3681 |

24|*  2 |   INDEX FAST FULL SCAN| RA_PK              |      1 |    297K|    297K|00:00:00.05 |    1240 |     35 |

25|*  3 |   TABLE ACCESS FULL   | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.12 |   13831 |   3646 |

26---------------------------------------------------------------------------------------------------------------

27 

28Predicate Information (identified by operation id):

29---------------------------------------------------

30 

31   1 - access("RA2"."RELATION_ID"="RA"."RELATION_ID")

32       filter("RA2"."STARTDATE">"RA"."STARTDATE")

33   2 - filter("RA2"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))

34   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))

35 

36 

3730 rows selected.

A HASH JOIN ANTI for the not exists, and a total of .66 seconds. Next, the plan for the query with the analytic row_number function:

01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))

02  2  /

03 

04PLAN_TABLE_OUTPUT

05---------------------------------------------------------------------------------------------------------------------------------------

06SQL_ID  1zd4wqtxkc2vz, child number 0

07-------------------------------------

08select relation_id      , startdate      , address      , postal_code

09   , city   from ( select ra.relation_id               , ra.startdate

10            , ra.address               , ra.postal_code               ,

11ra.city               , row_number() over (partition by ra.relation_id

12order by ra.startdate desc) rn            from relation_addresses ra

13       where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')

14  )  where rn = 1

15 

16Plan hash value: 2795878473

17 

18------------------------------------------------------------------------------------------------------------------

19| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

20------------------------------------------------------------------------------------------------------------------

21|   0 | SELECT STATEMENT         |                    |      1 |        |    100K|00:00:00.97 |    7238 |   3646 |

22|*  1 |  VIEW                    |                    |      1 |    297K|    100K|00:00:00.97 |    7238 |   3646 |

23|*  2 |   WINDOW SORT PUSHED RANK|                    |      1 |    297K|    200K|00:00:00.93 |    7238 |   3646 |

24|*  3 |    TABLE ACCESS FULL     | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |

25------------------------------------------------------------------------------------------------------------------

26 

27Predicate Information (identified by operation id):

28---------------------------------------------------

29 

30   1 - filter("RN"=1)

31   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "RA"."RELATION_ID" ORDER BY

32              INTERNAL_FUNCTION("RA"."STARTDATE") DESC )<=1)

33   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))

34 

35 

3629 rows selected.

Note that this query takes longer than the correlated subquery above: .97 seconds versus .66 seconds. The HASH JOIN ANTI took .49 seconds (.66 - .05 -.12) where computing the ROW_NUMBER took .84 seconds (.93 - .09). So here, on my laptop, I have avoided .05 seconds for the INDEX FAST FULL SCAN, but spend .35 (.84 - .49) seconds more for the computation. Likely, when I/O is more expensive than on my laptop, the time of the first query will go up and the times will be closer to each other. Now the keep clause variant:

01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))

02  2  /

03 

04PLAN_TABLE_OUTPUT

05---------------------------------------------------------------------------------------------------------------------------------------

06SQL_ID  dcw8tyyqtu2kk, child number 0

07-------------------------------------

08select ra.relation_id      , max(ra.startdate) startdate      ,

09max(ra.address) keep (dense_rank last order by ra.startdate) address

10  , max(ra.postal_code) keep (dense_rank last order by ra.startdate)

11postal_code      , max(ra.city) keep (dense_rank last order by

12ra.startdate) city   from relation_addresses ra  where ra.startdate <=

13to_date(:REFERENCE_DATE,'yyyy-mm-dd')  group by ra.relation_id

14 

15Plan hash value: 2324030966

16 

17------------------------------------------------------------------------------------------------------------

18| Id  | Operation          | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

19------------------------------------------------------------------------------------------------------------

20|   0 | SELECT STATEMENT   |                    |      1 |        |    100K|00:00:00.55 |    7238 |   3646 |

21|   1 |  SORT GROUP BY     |                    |      1 |    100K|    100K|00:00:00.55 |    7238 |   3646 |

22|*  2 |   TABLE ACCESS FULL| RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |

23------------------------------------------------------------------------------------------------------------

24 

25Predicate Information (identified by operation id):

26---------------------------------------------------

27 

28   2 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))

29 

30 

3124 rows selected.

The shortest query, the shortest plan and the fastest execution. The SORT GROUP BY immediately reduces the number of intermediate rows from 297K to 100K, whereas the WINDOW SORT PUSHED RANK had to compute the row_number for all 297K rows.
Much Ado About Nothing?

I was reading this presentation PDF of Hugh Darwen recently, called How To Handle Missing Information Without Using NULL. Several great thinkers and founders of the relational theory consider NULL as the thing that should not be. For example, one slide in the above mentioned PDF is titled SQL's Nulls Are A Disaster. And I found a paper with the amusing title The Final Null In The Coffin.

I can understand the critique. The introduction of NULL leads to three valued logic, which makes programs much more complex and harder to prove correct. All database professionals likely have been bitten by NULLs several times during their career, myself included. And a NULL can have several interpretations. By using NULL, you are not making clear what is meant. If the value for column hair_colour is NULL, does it mean the person is bald? Or do you know the person has hair, but you just don't know what colour? Or can the person be bald or have hair, but you just don't know which one applies? Or is the person in the midst of a hair colouring exercise and you only temporarily don't know the colour? If you're creative, I'm sure you can come up with other interpretations as well.

On the other hand, the theorists don't have to build database applications for end users who like reasonable response times, and I do. Avoiding nulls at all cost typically leads to a data model that has more tables than needed, requiring more joins and therefore making queries slower. So I have to make a trade off. In general I try to avoid nullable columns as much as possible, for example by chosing subtype implementations instead of supertype implementations, and by modelling entity subtypes in the first place, but I will never let it noticeably slow down my application. At my current job, I'm making a data model right now. Having read all use cases, I know how the data will be used and so I know where in the model there is room to avoid an extra nullable column. One thing I'll never voluntarily do though, is make up strange outlier values just to get rid of the null.

Any way, I was curious to see how Hugh Darwen handles missing information without using nulls. In his paper, he has a concise example, which I'll translate to Oracle syntax in this blogpost to see what practically needs to happen to avoid nulls in his example. He starts with this table:

01SQL> select *

02  2    from pers_info

03  3  /

04 

05        ID NAME       JOB            SALARY

06---------- ---------- ---------- ----------

07      1234 Anne       Lawyer         100000

08      1235 Boris      Banker

09      1236 Cindy                      70000

10      1237 Davinder

11 

124 rows selected.

Which contains four NULL values. The meaning of those NULL values can't be seen from this table, but this is what they are meant to be:

Boris earns something, but we don't know how much
Cindy does some job, but we don't know what it is
Davinder doesn't have a job
Davinder doesn't have a salary

So he applies a technique called vertical decomposition and on top of those results horizontal decomposition, to arrive at the seven tables below, where everything has a clear meaning.

01SQL> select *

02  2    from called

03  3  /

04 

05        ID NAME

06---------- --------

07      1234 Anne

08      1235 Boris

09      1236 Cindy

10      1237 Davinder

11 

124 rows selected.

13 

14SQL> select *

15  2    from does_job

16  3  /

17 

18        ID JOB

19---------- ------

20      1234 Lawyer

21      1235 Banker

22 

232 rows selected.

24 

25SQL> select *

26  2    from job_unk

27  3  /

28 

29        ID

30----------

31      1236

32 

331 row selected.

34 

35SQL> select *

36  2    from unemployed

37  3  /

38 

39        ID

40----------

41      1237

42 

431 row selected.

44 

45SQL> select *

46  2    from earns

47  3  /

48 

49        ID     SALARY

50---------- ----------

51      1234     100000

52      1236      70000

53 

542 rows selected.

55 

56SQL> select *

57  2    from salary_unk

58  3  /

59 

60        ID

61----------

62      1235

63 

641 row selected.

65 

66SQL> select *

67  2    from unsalaried

68  3  /

69 

70        ID

71----------

72      1237

73 

741 row selected.

Here we achieved a data model where every NULL has been banned out.

Now what if we'd like to simulate a query against the PERS_INFO table? Darwen uses this expression to transform the seven tables back to the PERS_INFO table:

01WITH (EXTEND JOB_UNK ADD ‘Job unknown’ AS Job_info) AS T1,

02     (EXTEND UNEMPLOYED ADD ‘Unemployed’ AS Job_info) AS T2,

03     (DOES_JOB RENAME (Job AS Job_info)) AS T3,

04     (EXTEND SALARY_UNK ADD ‘Salary unknown’ AS Sal_info) AS T4,

05     (EXTEND UNSALARIED ADD ‘Unsalaried’ AS Sal_info) AS T5,

06     (EXTEND EARNS ADD CHAR(Salary) AS Sal_info) AS T6,

07     (T6 { ALL BUT Salary }) AS T7,

08     (UNION ( T1, T2, T3 )) AS T8,

09     (UNION ( T4, T5, T7 )) AS T9,

10     (JOIN ( CALLED, T8, T9 )) AS PERS_INFO :

11PERS_INFO

Translated to Oracle syntax, this becomes:

01SQL> with t1 as

02  2  ( select id

03  3         , 'Job unknown' as job_info

04  4      from job_unk

05  5  )

06  6  , t2 as

07  7  ( select id

08  8         , 'Unemployed' as job_info

09  9      from unemployed

10 10  )

11 11  , t3 as

12 12  ( select id

13 13         , job as job_info

14 14      from does_job

15 15  )

16 16  , t4 as

17 17  ( select id

18 18         , 'Salary unknown' as sal_info

19 19      from salary_unk

20 20  )

21 21  , t5 as

22 22  ( select id

23 23         , 'Unsalaried' as sal_info

24 24      from unsalaried

25 25  )

26 26  , t6 as

27 27  ( select id

28 28         , salary

29 29         , to_char(salary,'fm999G999') as sal_info

30 30      from earns

31 31  )

32 32  , t7 as

33 33  ( select id

34 34         , sal_info

35 35      from t6

36 36  )

37 37  , t8 as

38 38  ( select id

39 39         , job_info

40 40      from t1

41 41     union all

42 42    select id

43 43         , job_info

44 44      from t2

45 45     union all

46 46    select id

47 47         , job_info

48 48      from t3

49 49  )

50 50  , t9 as

51 51  ( select id

52 52         , sal_info

53 53      from t4

54 54     union all

55 55    select id

56 56         , sal_info

57 57      from t5

58 58     union all

59 59    select id

60 60         , sal_info

61 61      from t7

62 62  )

63 63  , pers_info as

64 64  ( select c.id

65 65         , c.name

66 66         , j.job_info

67 67         , s.sal_info

68 68      from called c

69 69           inner join t8 j on (c.id = j.id)

70 70           inner join t9 s on (c.id = s.id)

71 71  )

72 72  select *

73 73    from pers_info

74 74  /

75 

76        ID NAME     JOB_INFO    SAL_INFO

77---------- -------- ----------- --------------

78      1235 Boris    Banker      Salary unknown

79      1237 Davinder Unemployed  Unsalaried

80      1234 Anne     Lawyer      100,000

81      1236 Cindy    Job unknown 70,000

82 

834 rows selected.

Very elaborate, but the optimizer does a great job at simplifying the query under the covers, as can be seen in this execution plan:

01SQL> select *

02  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))

03  3  /

04 

05PLAN_TABLE_OUTPUT

06---------------------------------------------------------------------------------------------------------------------------------------

07SQL_ID  bmrtdy0jad18p, child number 0

08-------------------------------------

09with t1 as ( select id        , 'Job unknown' as job_info     from

10job_unk ) , t2 as ( select id        , 'Unemployed' as job_info

11from unemployed ) , t3 as ( select id        , job as job_info     from

12does_job ) , t4 as ( select id        , 'Salary unknown' as sal_info

13 from salary_unk ) , t5 as ( select id        , 'Unsalaried' as

14sal_info     from unsalaried ) , t6 as ( select id        , salary

15  , to_char(salary,'fm999G999') as sal_info     from earns ) , t7 as (

16select id        , sal_info     from t6 ) , t8 as ( select id        ,

17job_info     from t1    union all   select id        , job_info

18from t2    union all   select id        , job_info     from t3 ) , t9

19as ( select id        , sal_info     from t4    union all   select id

20     , sal_info     from t5    union all   select id        , sal_info

21   from t7 ) , pers_info as ( select c.id        , c.name        ,

22j.job_info        , s.sal_info     from called c          inner join t8

23j on (c.id = j.id)

24 

25Plan hash value: 583520090

26 

27-------------------------------------------------------------------------------------------------------------------------

28| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

29-------------------------------------------------------------------------------------------------------------------------

30|   0 | SELECT STATEMENT      |            |      1 |        |      4 |00:00:00.01 |      14 |       |       |          |

31|*  1 |  HASH JOIN            |            |      1 |      4 |      4 |00:00:00.01 |      14 |  1011K|  1011K|  550K (0)|

32|*  2 |   HASH JOIN           |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1180K|  1180K|  548K (0)|

33|   3 |    TABLE ACCESS FULL  | CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |

34|   4 |    VIEW               |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |

35|   5 |     UNION-ALL         |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |

36|   6 |      TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

37|   7 |      TABLE ACCESS FULL| UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

38|   8 |      TABLE ACCESS FULL| DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |

39|   9 |   VIEW                |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |

40|  10 |    UNION-ALL          |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |

41|  11 |     TABLE ACCESS FULL | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

42|  12 |     TABLE ACCESS FULL | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

43|  13 |     TABLE ACCESS FULL | EARNS      |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |

44-------------------------------------------------------------------------------------------------------------------------

45 

46Predicate Information (identified by operation id):

47---------------------------------------------------

48 

49   1 - access("C"."ID"="S"."ID")

50   2 - access("C"."ID"="J"."ID")

51 

52 

5345 rows selected.

If I had to build the PERS_INFO table back again with a query myself, I'd use this shorter query with six left outer joins:

01SQL> select c.id

02  2       , c.name

03  3       , coalesce(j.job,nvl2(ju.id,'Job unknown',null),nvl2(ue.id,'Unemployed',null)) job_info

04  4       , coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info

05  5    from called c

06  6         left outer join does_job j on (c.id = j.id)

07  7         left outer join job_unk ju on (c.id = ju.id)

08  8         left outer join unemployed ue on (c.id = ue.id)

09  9         left outer join earns e on (c.id = e.id)

10 10         left outer join salary_unk su on (c.id = su.id)

11 11         left outer join unsalaried us on (c.id = us.id)

12 12  /

13 

14        ID NAME     JOB_INFO    SALARY_INFO

15---------- -------- ----------- --------------

16      1234 Anne     Lawyer      100,000

17      1236 Cindy    Job unknown 70,000

18      1235 Boris    Banker      Salary unknown

19      1237 Davinder Unemployed  Unsalaried

20 

214 rows selected.

Although, as you can see below, the plan doesn't really improve:

01SQL> select *

02  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))

03  3  /

04 

05PLAN_TABLE_OUTPUT

06---------------------------------------------------------------------------------------------------------------------------------------

07SQL_ID  6x45b27mvpb1m, child number 0

08-------------------------------------

09select c.id      , c.name      , coalesce(j.job,nvl2(ju.id,'Job

10unknown',null),nvl2(ue.id,'Unemployed',null)) job_info      ,

11coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary

12unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info   from called

13c        left outer join does_job j on (c.id = j.id)        left outer

14join job_unk ju on (c.id = ju.id)        left outer join unemployed ue

15on (c.id = ue.id)        left outer join earns e on (c.id = e.id)

16 left outer join salary_unk su on (c.id = su.id)        left outer join

17unsalaried us on (c.id = us.id)

18 

19Plan hash value: 3398518218

20 

21---------------------------------------------------------------------------------------------------------------------------

22| Id  | Operation               | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

23---------------------------------------------------------------------------------------------------------------------------

24|   0 | SELECT STATEMENT        |            |      1 |        |      4 |00:00:00.01 |      15 |       |       |          |

25|*  1 |  HASH JOIN OUTER        |            |      1 |      4 |      4 |00:00:00.01 |      15 |   955K|   955K|  528K (0)|

26|*  2 |   HASH JOIN OUTER       |            |      1 |      4 |      4 |00:00:00.01 |      12 |  1000K|  1000K|  523K (0)|

27|*  3 |    HASH JOIN OUTER      |            |      1 |      4 |      4 |00:00:00.01 |      10 |  1035K|  1035K|  536K (0)|

28|*  4 |     HASH JOIN OUTER     |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1063K|  1063K|  536K (0)|

29|*  5 |      HASH JOIN OUTER    |            |      1 |      4 |      4 |00:00:00.01 |       6 |  1114K|  1114K|  537K (0)|

30|*  6 |       HASH JOIN OUTER   |            |      1 |      4 |      4 |00:00:00.01 |       4 |  1180K|  1180K|  538K (0)|

31|   7 |        TABLE ACCESS FULL| CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |

32|   8 |        TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

33|   9 |       TABLE ACCESS FULL | UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

34|  10 |      TABLE ACCESS FULL  | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

35|  11 |     TABLE ACCESS FULL   | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |

36|  12 |    TABLE ACCESS FULL    | DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |

37|  13 |   TABLE ACCESS FULL     | EARNS      |      1 |      2 |      2 |00:00:00.01 |       3 |       |       |          |

38---------------------------------------------------------------------------------------------------------------------------

39 

40Predicate Information (identified by operation id):

41---------------------------------------------------

42 

43   1 - access("C"."ID"="E"."ID")

44   2 - access("C"."ID"="J"."ID")

45   3 - access("C"."ID"="US"."ID")

46   4 - access("C"."ID"="SU"."ID")

47   5 - access("C"."ID"="UE"."ID")

48   6 - access("C"."ID"="JU"."ID")

49 

50 

5143 rows selected.

But the two plans above are really complex, compared with a simple query against the PERS_INFO table with nullable columns:

01SQL> select *

02  2    from pers_info

03  3  /

04 

05        ID NAME       JOB            SALARY

06---------- ---------- ---------- ----------

07      1234 Anne       Lawyer         100000

08      1235 Boris      Banker

09      1236 Cindy                      70000

10      1237 Davinder

11 

124 rows selected.

13 

14SQL> select *

15  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))

16  3  /

17 

18PLAN_TABLE_OUTPUT

19---------------------------------------------------------------------------------------------------------------------------------------

20SQL_ID  016x9f106gj27, child number 1

21-------------------------------------

22select *   from pers_info

23 

24Plan hash value: 1584579034

25 

26-----------------------------------------------------------------------------------------

27| Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

28-----------------------------------------------------------------------------------------

29|   0 | SELECT STATEMENT  |           |      1 |        |      4 |00:00:00.01 |       7 |

30|   1 |  TABLE ACCESS FULL| PERS_INFO |      1 |      4 |      4 |00:00:00.01 |       7 |

31-----------------------------------------------------------------------------------------

32 

33 

3413 rows selected.

If queries like this are not very frequent in your database, you might want to take this extra work for granted and avoid the NULL. But you need to consider something else as well: the new schema requires much more constraints. Using just the PERS_INFO table, a single primary key constraint on the Id column is all you need. But for the new model, Darwen describes 9, but really 15 constraints:

No two CALLED rows have the same Id. (Primary key)
Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.
No row in DOES_JOB has a matching row in JOB_UNK.
No row in DOES_JOB has a matching row in UNEMPLOYED.
No row in JOB_UNK has a matching row in UNEMPLOYED.
Every row in DOES_JOB has a matching row in CALLED. (Foreign key)
Every row in JOB_UNK has a matching row in CALLED. (Foreign key)
Every row in UNEMPLOYED has a matching row in CALLED. (Foreign key)
Constraints 2 through 8 repeated, mutatis mutandis, for CALLED with respect to EARNS, SALARY_UNK and UNSALARIED.

Implementing constraint 1 is easy:

1SQL> alter table called add primary key (id)

2  2  /

3 

4Table altered.

And so are constraints 6, 7 and 8:

01SQL>alter table does_job add foreign key (id) references called (id)

02  2  /

03 

04Table altered.

05 

06SQL> alter table job_unk add foreign key (id) references called (id)

07  2  /

08 

09Table altered.

10 

11SQL> alter table unemployed add foreign key (id) references called (id)

12  2  /

13 

14Table altered.

But constraint 2 says that the Id in table CALLED is a foreign distributed key. And constraints 3, 4 and 5 say the Id's of tables DOES_JOB, JOB_UNK and UNEMPLOYED are a distributed key. Oracle doesn't have declarative support for distributed keys or for foreign distributed keys. We could write database trigger code to implement this, which is very hard to do correct or we could use the materialized view trick to have the condition validated at the end of a transaction, instead of at the end of the statement, which also has its downsides. And such deferred constraint checking is explicitly ruled out by The Third Manifesto as well. Nevertheless, here is how it can be done.

The distributed key (constraints 3, 4 and 5):

01SQL> create materialized view log on does_job with rowid

02  2  /

03 

04Materialized view log created.

05 

06SQL> create materialized view log on job_unk with rowid

07  2  /

08 

09Materialized view log created.

10 

11SQL> create materialized view log on unemployed with rowid

12  2  /

13 

14Materialized view log created.

15 

16SQL> create materialized view distributed_key_vw

17  2    refresh fast on commit

18  3  as

19  4  select d.rowid rid

20  5       , d.id    id

21  6       , 'D'     umarker

22  7    from does_job d

23  8   union all

24  9  select j.rowid

25 10       , j.id

26 11       , 'J'

27 12    from job_unk j

28 13   union all

29 14  select u.rowid

30 15       , u.id

31 16       , 'U'

32 17    from unemployed u

33 18  /

34 

35Materialized view created.

36 

37SQL> alter table distributed_key_vw

38  2    add constraint distributed_key_check

39  3    primary key (id)

40  4  /

41 

42Table altered.

And to show that the distributed key implementation works:

01SQL> insert into job_unk values (1234)

02  2  /

03 

041 row created.

05 

06SQL> commit

07  2  /

08commit

09*

10ERROR at line 1:

11ORA-12048: error encountered while refreshing materialized view "RWIJK"."DISTRIBUTED_KEY_VW"

12ORA-00001: unique constraint (RWIJK.DISTRIBUTED_KEY_CHECK) violated

And the foreign distributed key ("Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.") can be implemented like this:

01SQL> create materialized view log on does_job with rowid

02  2  /

03 

04Materialized view log created.

05 

06SQL> create materialized view log on job_unk with rowid

07  2  /

08 

09Materialized view log created.

10 

11SQL> create materialized view log on unemployed with rowid

12  2  /

13 

14Materialized view log created.

15 

16SQL> create materialized view foreign_distributed_key_vw

17  2    refresh fast on commit

18  3  as

19  4  select c.rowid  c_rowid

20  5       , dj.rowid dj_rowid

21  6       , ju.rowid ju_rowid

22  7       , ue.rowid ue_rowid

23  8       , c.id     id

24  9       , dj.id    dj_id

25 10       , ju.id    ju_id

26 11       , ue.id    ue_id

27 12    from called c

28 13       , does_job dj

29 14       , job_unk ju

30 15       , unemployed ue

31 16   where c.id = dj.id (+)

32 17     and c.id = ju.id (+)

33 18     and c.id = ue.id (+)

34 19  /

35 

36Materialized view created.

37 

38SQL> alter table foreign_distributed_key_vw

39  2    add constraint foreign_distributed_key_check

40  3    check (coalesce(dj_id,ju_id,ue_id) is not null)

41  4  /

42 

43Table altered.

And some proof that this implementation works:

01SQL> insert into called values (1238,'Elise')

02  2  /

03 

041 row created.

05 

06SQL> commit

07  2  /

08commit

09*

10ERROR at line 1:

11ORA-12008: error in materialized view refresh path

12ORA-02290: check constraint (RWIJK.FOREIGN_DISTRIBUTED_KEY_CHECK) violated

Would I go through the extra trouble of an implementation with 6 more tables, 14 extra constraints and worse performance like above? It depends. It depends on how often the data is queried, and on how often it is updated concurrently. And on whether the distinction between the possible multiple meanings of NULL is relevant in my case. And whether I have sufficient extra time to implement it. Using Oracle, probably most often, I won't.

Connect By Filtering

A hierarchical query is typically executed using a plan that starts with the operation CONNECT BY WITH FILTERING, which has two child operations. The first child operation implements the START WITH clause and the second child operation contains a step called CONNECT BY PUMP, implementing the recursive part of the query. Here is an example of such a plan using the well known hierarchical query on table EMP:

01SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename

02  2        , level

03  3        , job

04  4        , deptno

05  5     from emp

06  6  connect by mgr = prior empno

07  7    start with mgr is null

08  8  /

09 

10ENAME                     LEVEL JOB                             DEPTNO

11-------------------- ---------- --------------------------- ----------

12KING                          1 PRESIDENT                           10

13  JONES                       2 MANAGER                             20

14    SCOTT                     3 ANALYST                             20

15      ADAMS                   4 CLERK                               20

16    FORD                      3 ANALYST                             20

17      SMITH                   4 CLERK                               20

18  BLAKE                       2 MANAGER                             30

19    ALLEN                     3 SALESMAN                            30

20    WARD                      3 SALESMAN                            30

21    MARTIN                    3 SALESMAN                            30

22    TURNER                    3 SALESMAN                            30

23    JAMES                     3 CLERK                               30

24  CLARK                       2 MANAGER                             10

25    MILLER                    3 CLERK                               10

26 

2714 rows selected.

28 

29SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

30  2  /

31 

32PLAN_TABLE_OUTPUT

33---------------------------------------------------------------------------------------------------------------------------------------

34SQL_ID  d2c7xqxbr112u, child number 0

35-------------------------------------

36 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level       , job       , deptno    from emp connect by

37mgr = prior empno   start with mgr is null

38 

39Plan hash value: 1869448388

40 

41--------------------------------------------------------------------------------------------------------------------------------

42| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |

43--------------------------------------------------------------------------------------------------------------------------------

44|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.02 |      15 |      6 |  2048 |  2048 | 2048  (0)|

45|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |      6 |       |       |          |

46|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |      0 |  1452K|  1452K|  853K (0)|

47|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |      0 |       |       |          |

48|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |      0 |       |       |          |

49--------------------------------------------------------------------------------------------------------------------------------

50 

51Predicate Information (identified by operation id):

52---------------------------------------------------

53 

54   1 - access("MGR"=PRIOR NULL)

55   2 - filter("MGR" IS NULL)

56   3 - access("MGR"=PRIOR NULL)

57 

58 

5924 rows selected.

You can see a great and more detailed explanation of connect by with filtering here on Christian Antognini's blog.

When I was researching the new recursive subquery factoring clause one and a half year ago, and compared a standard hierarchical query on EMP using recursive subquery factoring with a query using the good old connect by, I stumbled upon a new optimizer algorithm for implementing recursive queries:

01SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename

02  2        , level

03  3        , job

04  4        , deptno

05  5     from emp

06  6  connect by mgr = prior empno

07  7    start with mgr is null

08  8  /

09 

10ENAME                     LEVEL JOB           DEPTNO

11-------------------- ---------- --------- ----------

12KING                          1 PRESIDENT         10

13  JONES                       2 MANAGER           20

14    SCOTT                     3 ANALYST           20

15      ADAMS                   4 CLERK             20

16    FORD                      3 ANALYST           20

17      SMITH                   4 CLERK             20

18  BLAKE                       2 MANAGER           30

19    ALLEN                     3 SALESMAN          30

20    WARD                      3 SALESMAN          30

21    MARTIN                    3 SALESMAN          30

22    TURNER                    3 SALESMAN          30

23    JAMES                     3 CLERK             30

24  CLARK                       2 MANAGER           10

25    MILLER                    3 CLERK             10

26 

2714 rows selected.

28 

29SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

30  2  /

31 

32PLAN_TABLE_OUTPUT

33---------------------------------------------------------------------------------------------------------------------------------------

34SQL_ID  d2c7xqxbr112u, child number 0

35-------------------------------------

36 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level

37    , job       , deptno    from emp connect by mgr = prior empno

38start with mgr is null

39 

40Plan hash value: 763482334

41 

42-------------------------------------------------------------------------------------------------------------------

43| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |

44-------------------------------------------------------------------------------------------------------------------

45|   0 | SELECT STATEMENT                        |      |      1 |        |     14 |00:00:00.02 |       6 |      6 |

46|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.02 |       6 |      6 |

47|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.02 |       6 |      6 |

48-------------------------------------------------------------------------------------------------------------------

49 

50Predicate Information (identified by operation id):

51---------------------------------------------------

52 

53   1 - access("MGR"=PRIOR NULL)

54       filter("MGR" IS NULL)

55 

56 

5722 rows selected.

You might wonder what I did to make two exactly the same queries to use a different execution plan, but I'll address that later. First, I'd like to show there are two optimizer hints available, with which you can control which algorithm the optimizer uses:

01SQL> select *

02  2    from v$sql_hint

03  3   where name like '%CONNECT_BY_FILTERING%'

04  4  /

05 

06NAME                    SQL_FEATURE  CLASS

07----------------------- ------------ -----------------------

08INVERSE                 TARGET_LEVEL   PROPERTY VERSION    VERSION_OUTLINE

09----------------------- ------------ ---------- ---------- ---------------

10CONNECT_BY_FILTERING    QKSFM_ALL    CONNECT_BY_FILTERING

11NO_CONNECT_BY_FILTERING            2         16 10.2.0.2   10.2.0.2

12 

13NO_CONNECT_BY_FILTERING QKSFM_ALL    CONNECT_BY_FILTERING

14CONNECT_BY_FILTERING               2         16 10.2.0.2   10.2.0.2

15 

16 

172 rows selected.

And this was surprising to me. As the version column suggests, the no_connect_by_filtering hint and the accompanying new algorithm were already introduced in version 10.2.0.2! I checked with my old 10.2.0.4 database and it is indeed present and can be used there:

01SQL> select version

02  2    from v$instance

03  3  /

04 

05VERSION

06---------------------------------------------------

0710.2.0.4.0

08 

091 row selected.

10 

11SQL>  select /*+ no_connect_by_filtering gather_plan_statistics */

12  2          lpad(' ', 2 * level - 2, ' ') || ename as ename

13  3        , level

14  4        , job

15  5        , deptno

16  6     from emp

17  7  connect by mgr = prior empno

18  8    start with mgr is null

19  9  /

20 

21ENAME                     LEVEL JOB                             DEPTNO

22-------------------- ---------- --------------------------- ----------

23KING                          1 PRESIDENT                           10

24  JONES                       2 MANAGER                             20

25    SCOTT                     3 ANALYST                             20

26      ADAMS                   4 CLERK                               20

27    FORD                      3 ANALYST                             20

28      SMITH                   4 CLERK                               20

29  BLAKE                       2 MANAGER                             30

30    ALLEN                     3 SALESMAN                            30

31    WARD                      3 SALESMAN                            30

32    MARTIN                    3 SALESMAN                            30

33    TURNER                    3 SALESMAN                            30

34    JAMES                     3 CLERK                               30

35  CLARK                       2 MANAGER                             10

36    MILLER                    3 CLERK                               10

37 

3814 rows selected.

39 

40SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

41  2  /

42 

43PLAN_TABLE_OUTPUT

44---------------------------------------------------------------------------------------------------------------------------------------

45SQL_ID  39kr5s8dxz7j0, child number 0

46-------------------------------------

47 select /*+ no_connect_by_filtering gather_plan_statistics */         lpad(' ', 2 * level - 2, '

48') || ename as ename       , level       , job       , deptno    from emp connect by mgr = prior

49empno   start with mgr is null

50 

51Plan hash value: 763482334

52 

53----------------------------------------------------------------------------------------------------------

54| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

55----------------------------------------------------------------------------------------------------------

56|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.01 |       3 |

57|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.01 |       3 |

58----------------------------------------------------------------------------------------------------------

59 

60Predicate Information (identified by operation id):

61---------------------------------------------------

62 

63   1 - access("MGR"=PRIOR NULL)

64       filter("MGR" IS NULL)

65 

66 

6721 rows selected.

But you need the no_connect_by_filtering hint in version 10.2.0.4 for this query. If you do not provide the hint, this is the result:

01SQL>  select /*+ gather_plan_statistics */

02  2          lpad(' ', 2 * level - 2, ' ') || ename as ename

03  3        , level

04  4        , job

05  5        , deptno

06  6     from emp

07  7  connect by mgr = prior empno

08  8    start with mgr is null

09  9  /

10 

11ENAME                     LEVEL JOB                             DEPTNO

12-------------------- ---------- --------------------------- ----------

13KING                          1 PRESIDENT                           10

14  JONES                       2 MANAGER                             20

15    SCOTT                     3 ANALYST                             20

16      ADAMS                   4 CLERK                               20

17    FORD                      3 ANALYST                             20

18      SMITH                   4 CLERK                               20

19  BLAKE                       2 MANAGER                             30

20    ALLEN                     3 SALESMAN                            30

21    WARD                      3 SALESMAN                            30

22    MARTIN                    3 SALESMAN                            30

23    TURNER                    3 SALESMAN                            30

24    JAMES                     3 CLERK                               30

25  CLARK                       2 MANAGER                             10

26    MILLER                    3 CLERK                               10

27 

2814 rows selected.

29 

30SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

31  2  /

32 

33PLAN_TABLE_OUTPUT

34---------------------------------------------------------------------------------------------------------------------------------------

35SQL_ID  6zhtnf720u0bm, child number 0

36-------------------------------------

37 select /*+ gather_plan_statistics */         lpad(' ', 2 * level - 2, ' ') || ename as ename       , level

38   , job       , deptno    from emp connect by mgr = prior empno   start with mgr is null

39 

40Plan hash value: 1869448388

41 

42-----------------------------------------------------------------------------------------------------------------------

43| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

44-----------------------------------------------------------------------------------------------------------------------

45|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.01 |      15 |  2048 |  2048 | 2048  (0)|

46|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |

47|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |  1452K|  1452K|  843K (0)|

48|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |       |       |          |

49|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |       |       |          |

50-----------------------------------------------------------------------------------------------------------------------

51 

52Predicate Information (identified by operation id):

53---------------------------------------------------

54 

55   1 - access("MGR"=PRIOR NULL)

56   2 - filter("MGR" IS NULL)

57   3 - access("MGR"=PRIOR NULL)

58 

59 

6024 rows selected.

Which explains why I didn't see the CONNECT BY NO FILTERING WITH START-WITH earlier. It seems that Oracle has adjusted the cost calculation of connect by queries somewhere between 10.2.0.4 and 11.2.0.1. Just look at the cost from both execution plans on 10.2.0.4 using a regular explain plan statement and a "select * from table(dbms_xplan.display):

01----------------------------------------------------------------------------------

Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |

03----------------------------------------------------------------------------------

   0 | SELECT STATEMENT          |      |     2 |    50 |     3   (0)| 00:00:01 |

*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |

*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    29 |     3   (0)| 00:00:01 |

*  3 |   HASH JOIN               |      |       |       |            |          |

   4 |    CONNECT BY PUMP        |      |       |       |            |          |

   5 |    TABLE ACCESS FULL      | EMP  |     2 |    50 |     3   (0)| 00:00:01 |

10----------------------------------------------------------------------------------

1------------------------------------------------------------------------------------------------

2| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |

3------------------------------------------------------------------------------------------------

4|   0 | SELECT STATEMENT                        |      |    14 |   350 |     3   (0)| 00:00:01 |

5|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |

6|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     3   (0)| 00:00:01 |

7------------------------------------------------------------------------------------------------

The cost of 3 is due to the full table scan of EMP, and no additional cost is added for the hierarchical query.

These are the plans from 11.2.0.2:

01----------------------------------------------------------------------------------

Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |

03----------------------------------------------------------------------------------

   0 | SELECT STATEMENT          |      |     3 |   156 |    15  (20)| 00:00:01 |

*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |

*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    25 |     4   (0)| 00:00:01 |

*  3 |   HASH JOIN               |      |     2 |    76 |     9  (12)| 00:00:01 |

   4 |    CONNECT BY PUMP        |      |       |       |            |          |

*  5 |    TABLE ACCESS FULL      | EMP  |    13 |   325 |     4   (0)| 00:00:01 |

10----------------------------------------------------------------------------------

1------------------------------------------------------------------------------------------------

2| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |

3------------------------------------------------------------------------------------------------

4|   0 | SELECT STATEMENT                        |      |    14 |   728 |     5  (20)| 00:00:01 |

5|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |

6|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     4   (0)| 00:00:01 |

7------------------------------------------------------------------------------------------------

The numbers from the 11.2.0.2 show more sophistication than just the cost of the table scan. The optimizer can't know how many levels deep the data is, but version 10.2.0.4 apparently picked 1, and left the total cost unchanged from 3 to 3. I'm curious to know in which version in between 10.2.0.4 and 11.2.0.2 this cost calculation changed. If anyone who is reading this, has a version in between and likes to check, please let me know in the comments. My guess would be that 11.2.0.1 contained the cost change.

What does CONNECT BY NO FILTERING WITH START-WITH do?

Let's explore this, using this table:

01SQL> create table t (id, parent_id, value, indicator)

02  2  as

03  3   select level - 1

04  4        , case level when 1 then null else trunc((level-1)/10) end

05  5        , round(dbms_random.value * 1000)

06  6        , case mod(level,10) when 4 then 'N' else 'Y' end

07  7     from dual

08  8  connect by level <= 100000

09  9  /

10 

11Table created.

12 

13SQL> alter table t

14  2    add constraint cbt_pk

15  3    primary key (id)

16  4  /

17 

18Table altered.

19 

20SQL> create index i1 on t (parent_id,indicator)

21  2  /

22 

23Index created.

24 

25SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true)

The data is tree shaped where each parent node has exactly 9 child nodes. One tenth of the data, with an id that ends with the digit 3, has its indicator column set to 'N'. This select query will make it clearer how the data looks like:

01SQL> select *

02  2    from t

03  3   where id < 24 or id > 99997

04  4   order by id

05  5  /

06 

07        ID  PARENT_ID      VALUE I

08---------- ---------- ---------- -

09         0                   656 Y

10         1          0        289 Y

11         2          0        365 Y

12         3          0        644 N

13         4          0        364 Y

14         5          0        841 Y

15         6          0        275 Y

16         7          0        529 Y

17         8          0        500 Y

18         9          0        422 Y

19        10          1        598 Y

20        11          1        104 Y

21        12          1        467 Y

22        13          1        296 N

23        14          1        105 Y

24        15          1        220 Y

25        16          1        692 Y

26        17          1        793 Y

27        18          1         29 Y

28        19          1        304 Y

29        20          2        467 Y

30        21          2        716 Y

31        22          2        837 Y

32        23          2        432 N

33     99998       9999        609 Y

34     99999       9999         24 Y

35 

3626 rows selected.

When hearing the word "filter", I almost immediately associate it with a WHERE clause. But a where clause in a connect by query, is not what is meant by connect by filtering. The documentation states:

Oracle processes hierarchical queries as follows:

A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.

The CONNECT BY condition is evaluated.

Any remaining WHERE clause predicates are evaluated.

So a where clause predicate is evaluated AFTER the connect by has done its job. You can see that happening here:

SQL> explain plan
  2  for
  3   select id
  4        , parent_id
  5        , sys_connect_by_path(id,'->') scbp
  6     from t
  7    where indicator = 'N'
  8  connect by parent_id = prior id
  9    start with parent_id is null
 10  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2502271019

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |    11 |   319 |   164   (3)| 00:00:02 |
|*  1 |  FILTER                        |      |       |       |            |          |
|*  2 |   CONNECT BY WITH FILTERING    |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL           | T    |     1 |    11 |    80   (2)| 00:00:01 |
|   4 |    NESTED LOOPS                |      |    10 |   240 |    82   (2)| 00:00:01 |
|   5 |     CONNECT BY PUMP            |      |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID| T    |    10 |   110 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | I1   |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("INDICATOR"='N')
   2 - access("PARENT_ID"=PRIOR "ID")
   3 - filter("PARENT_ID" IS NULL)
   7 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id   ")

22 rows selected.

The "indicator = 'N'" predicate is at step 1, which is executed after the CONNECT BY WITH FILTERING at step 2. Note that although this query is executed in 11.2.0.2, the optimizer has chosen the old CONNECT BY WITH FILTERING.

Connect by filtering is done by using filters in your CONNECT BY clause. Here is an example using the predicate "indicator = 'N'" inside the CONNECT BY clause:

01SQL>  select id

02  2        , parent_id

03  3        , sys_connect_by_path(id,'->') scbp

04  4     from t

05  5  connect by parent_id = prior id

06  6      and indicator = 'N'

07  7    start with parent_id is null

08  8  /

09 

10        ID  PARENT_ID SCBP

11---------- ---------- --------------------------------------------------

12         0            ->0

13         3          0 ->0->3

14        33          3 ->0->3->33

15       333         33 ->0->3->33->333

16      3333        333 ->0->3->33->333->3333

17     33333       3333 ->0->3->33->333->3333->33333

18 

196 rows selected.

20 

21SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

22  2  /

23 

24PLAN_TABLE_OUTPUT

25---------------------------------------------------------------------------------------------------------------------------------------

26SQL_ID  dzkjzrrzgnvd5, child number 0

27-------------------------------------

28 select id       , parent_id       , sys_connect_by_path(id,'->') scbp

29  from t connect by parent_id = prior id     and indicator = 'N'

30start with parent_id is null

31 

32Plan hash value: 3164577763

33 

34---------------------------------------------------------------------------------------------------------------------------

35| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |

36---------------------------------------------------------------------------------------------------------------------------

37|   0 | SELECT STATEMENT              |      |      1 |        |      6 |00:00:00.01 |     294 |       |       |          |

38|*  1 |  CONNECT BY WITH FILTERING    |      |      1 |        |      6 |00:00:00.01 |     294 |  2048 |  2048 | 2048  (0)|

39|*  2 |   TABLE ACCESS FULL           | T    |      1 |      1 |      1 |00:00:00.01 |     277 |       |       |          |

40|   3 |   NESTED LOOPS                |      |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |

41|   4 |    CONNECT BY PUMP            |      |      6 |        |      6 |00:00:00.01 |       0 |       |       |          |

42|   5 |    TABLE ACCESS BY INDEX ROWID| T    |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |

43|*  6 |     INDEX RANGE SCAN          | I1   |      6 |      5 |      5 |00:00:00.01 |      12 |       |       |          |

44---------------------------------------------------------------------------------------------------------------------------

45 

46Predicate Information (identified by operation id):

47---------------------------------------------------

48 

49   1 - access("PARENT_ID"=PRIOR NULL)

50   2 - filter("PARENT_ID" IS NULL)

51   6 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id     " AND "INDICATOR"='N')

52 

53 

5427 rows selected.

In the A-rows column, you can see that the connect by filtering was effective here. Only the necessary rows were being read. And this is the key difference between the two connect by algorithms: with CONNECT BY WITH FILTERING, you can filter within each recursion, whereas CONNECT BY NO FILTERING WITH START-WITH has to read everything, does an in-memory operation, and return the result. With this example, the latter is much less efficient:

01SQL>  select /*+ no_connect_by_filtering */ id

02  2        , parent_id

03  3        , sys_connect_by_path(id,'->') scbp

04  4     from t

05  5  connect by parent_id = prior id

06  6      and indicator = 'N'

07  7    start with parent_id is null

08  8  /

09 

10        ID  PARENT_ID SCBP

11---------- ---------- --------------------------------------------------

12         0            ->0

13         3          0 ->0->3

14        33          3 ->0->3->33

15       333         33 ->0->3->33->333

16      3333        333 ->0->3->33->333->3333

17     33333       3333 ->0->3->33->333->3333->33333

18 

196 rows selected.

20 

21SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))

22  2  /

23 

24PLAN_TABLE_OUTPUT

25---------------------------------------------------------------------------------------------------------------------------------------

26SQL_ID  3fcr31tp83by9, child number 0

27-------------------------------------

28 select /*+ no_connect_by_filtering */ id       , parent_id       ,

29sys_connect_by_path(id,'->') scbp    from t connect by parent_id =

30prior id     and indicator = 'N'   start with parent_id is null

31 

32Plan hash value: 2303479083

33 

34----------------------------------------------------------------------------------------------------------

35| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |

36----------------------------------------------------------------------------------------------------------

37|   0 | SELECT STATEMENT                        |      |      1 |        |      6 |00:00:00.14 |     277 |

38|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |      6 |00:00:00.14 |     277 |

39|   2 |   TABLE ACCESS FULL                     | T    |      1 |    100K|    100K|00:00:00.01 |     277 |

40----------------------------------------------------------------------------------------------------------

41 

42Predicate Information (identified by operation id):

43---------------------------------------------------

44 

45   1 - access("PARENT_ID"=PRIOR NULL)

46       filter("PARENT_ID" IS NULL)

47 

48 

4922 rows selected.

100K rows were being read, and the A-time was 0.14 seconds instead of 0.01 seconds. I wondered where those 0.14 seconds went to, since the plan shows it's NOT for the full table scan. Using Tom Kyte's runstats_pkg reveals this:

01SQL> declare

02  2    cursor c1

03  3    is

04  4     select /*+ connect_by_filtering */ id

05  5          , parent_id

06  6          , sys_connect_by_path(id,'->') scbp

07  7       from t

08  8    connect by parent_id = prior id

09  9        and indicator = 'N'

10 10      start with parent_id is null

11 11    ;

12 12    cursor c2

13 13    is

14 14     select /*+ no_connect_by_filtering */ id

15 15          , parent_id

16 16          , sys_connect_by_path(id,'->') scbp

17 17       from t

18 18    connect by parent_id = prior id

19 19        and indicator = 'N'

20 20      start with parent_id is null

21 21    ;

22 22  begin

23 23    runstats_pkg.rs_start;

24 24    for r in c1 loop null; end loop;

25 25    runstats_pkg.rs_middle;

26 26    for r in c2 loop null; end loop;

27 27    runstats_pkg.rs_stop;

28 28  end;

29 29  /

30Run1 ran in 0 hsecs

31Run2 ran in 10 hsecs

32run 1 ran in 0% of the time

33  

34Name                                  Run1        Run2        Diff

35STAT...HSC Heap Segment Block           16          15          -1

36STAT...db block changes                 48          47          -1

37STAT...consistent gets - exami           9           8          -1

38STAT...db block gets from cach          32          33           1

39STAT...db block gets                    32          33           1

40STAT...redo subscn max counts            0           1           1

41STAT...redo ordering marks               0           1           1

42STAT...redo entries                     16          15          -1

43STAT...calls to kcmgas                   0           1           1

44STAT...calls to kcmgcs                  29          28          -1

45STAT...free buffer requested             0           1           1

46STAT...Heap Segment Array Inse          16          15          -1

47STAT...consistent changes               32          31          -1

48STAT...heap block compress               9           8          -1

49STAT...parse time cpu                    1           0          -1

50STAT...buffer is pinned count            1           0          -1

51STAT...session cursor cache co           1           0          -1

52STAT...sql area evicted                  1           0          -1

53LATCH.undo global data                  11          10          -1

54LATCH.SQL memory manager worka           3           5           2

55LATCH.messages                           0           2           2

56LATCH.OS process allocation              0           2           2

57LATCH.simulator hash latch              20          23           3

58LATCH.object queue header oper           4           1          -3

59STAT...workarea executions - o          10           6          -4

60STAT...table fetch by rowid             15          10          -5

61STAT...index scans kdiixs1               6           0          -6

62LATCH.row cache objects                280         274          -6

63STAT...sorts (memory)                    8           2          -6

64STAT...CPU used by this sessio           2          11           9

65STAT...Elapsed Time                      1          11          10

66STAT...recursive cpu usage               2          12          10

67STAT...no work - consistent re         300         284         -16

68STAT...buffer is not pinned co          36          20         -16

69STAT...session logical reads           354         337         -17

70STAT...consistent gets from ca         313         296         -17

71STAT...consistent gets from ca         322         304         -18

72LATCH.shared pool                      186         168         -18

73STAT...consistent gets                 322         304         -18

74LATCH.shared pool simulator             23           4         -19

75LATCH.cache buffers chains             785         740         -45

76STAT...undo change vector size       3,500       3,420         -80

77STAT...redo size                     4,652       4,560         -92

78STAT...session uga memory                0     -65,488     -65,488

79STAT...session pga memory                0     -65,536     -65,536

80STAT...sorts (rows)                     12     100,001      99,989

81  

82Run1 latches total versus runs -- difference and pct

83Run1        Run2        Diff       Pct

841,467       1,384         -83    106.00%

85 

86PL/SQL procedure successfully completed

The major difference is the number of rows sorted! The CONNECT BY NO FILTERING WITH START-WITH sorts all 100K rows. This is a surprise, because normally when you sort, you use memory from the PGA workarea, which shows up in your memory statistics from your execution plan. But the no filtering plan did not show those statistics (OMem, 1Mem, Used-Mem). I have no explanation for this phenomenon yet.

Let's zoom in on the sorting:

01SQL> select sn.name

02  2       , ms.value

03  3    from v$mystat ms

04  4       , v$statname sn

05  5   where ms.statistic# = sn.statistic#

06  6     and sn.name like '%sort%'

07  7  /

08 

09NAME                         VALUE

10----------------------- ----------

11sorts (memory)                2278

12sorts (disk)                     0

13sorts (rows)               9425510

14 

153 rows selected.

16 

17SQL>  select id

18  2        , parent_id

19  3        , sys_connect_by_path(id,'->') scbp

20  4     from t

21  5  connect by parent_id = prior id

22  6      and indicator = 'N'

23  7    start with parent_id is null

24  8  /

25 

26        ID  PARENT_ID SCBP

27---------- ---------- --------------------------------------------------

28         0            ->0

29         3          0 ->0->3

30        33          3 ->0->3->33

31       333         33 ->0->3->33->333

32      3333        333 ->0->3->33->333->3333

33     33333       3333 ->0->3->33->333->3333->33333

34 

356 rows selected.

36 

37SQL> select sn.name

38  2       , ms.value

39  3    from v$mystat ms

40  4       , v$statname sn

41  5   where ms.statistic# = sn.statistic#

42  6     and sn.name like '%sort%'

43  7  /

44 

45NAME                         VALUE

46----------------------- ----------

47sorts (memory)                2286

48sorts (disk)                     0

49sorts (rows)               9425522

50 

513 rows selected.

52 

53SQL>  select /*+ no_connect_by_filtering */ id

54  2        , parent_id

55  3        , sys_connect_by_path(id,'->') scbp

56  4     from t

57  5  connect by parent_id = prior id

58  6      and indicator = 'N'

59  7    start with parent_id is null

60  8  /

61 

62        ID  PARENT_ID SCBP

63---------- ---------- --------------------------------------------------

64         0            ->0

65         3          0 ->0->3

66        33          3 ->0->3->33

67       333         33 ->0->3->33->333

68      3333        333 ->0->3->33->333->3333

69     33333       3333 ->0->3->33->333->3333->33333

70 

716 rows selected.

72 

73SQL> select sn.name

74  2       , ms.value

75  3    from v$mystat ms

76  4       , v$statname sn

77  5   where ms.statistic# = sn.statistic#

78  6     and sn.name like '%sort%'

79  7  /

80 

81NAME                         VALUE

82----------------------- ----------

83sorts (memory)                2288

84sorts (disk)                     0

85sorts (rows)               9525523

86 

873 rows selected.

So CONNECT BY WITH FILTERING did 8 sorts (2286 - 2278) and sorted 12 rows (9425522 - 9425510), whereas CONNECT BY NO FILTERING WITH START-WITH did 2 (2288 - 2286) sorts and sorted 100,001 rows (9525523 - 9425522).

And finally, I promised to explain why the first two queries of this blogpost are identical, but show a different execution plan. The reason is simple: the first one is executed on 10.2.0.4 and the second one on 11.2.0.2.

hi go through below link for more interview questions on sql and plsql....
http://swaretesting.blogspot.in/2010/08/sql-and-plsql-interview-questions-iii.html

Oracle PLSQL Tutorial Point

Saturday, 22 September 2012

SAVE EXCEPTIONS

SAVE EXCEPTIONS

UPDATE

Tuesday, 18 September 2012

EMPLOYEE

Joins

sql-and-plsql-interview-questions

Connect By Filtering

Popular Posts

VISITORS

Translate