Saturday 22 September 2012

SAVE EXCEPTIONS

SAVE EXCEPTIONS

Since version 9 it is possible to do bulk DML using FORALL and use the SAVE EXCEPTIONS clause. It makes sure that all invalid rows - the exceptions - are saved into the sql%bulk_exceptions array, while all valid rows are still processed. This array stores a record for each invalid row, containing an ERROR_INDEX which is the iteration number during the FORALL statement and an ERROR_CODE which corresponds with the Oracle error code.

The error text is not stored. The documentation says:

The individual error messages, or any substitution arguments, are not saved, but the error message text can looked up using ERROR_CODE with SQLERRM ...


Looks reasonable, but in our shop we validate lots of business rules with triggers. When a business rule is violated we do a RAISE_APPLICATION_ERROR(-20000,'APP-12345');

At client side (Webforms) the error message is looked up in the messages table and a friendly message is given. When coding a FORALL with SAVE EXCEPTIONS in such an environment, the error messages become useless as can be seen in the next example:

rwijk@ORA11G> create table mytable
  2  ( id number(4)
  3  , name varchar2(30)
  4  )
  5  /

Tabel is aangemaakt.

rwijk@ORA11G> create trigger mytable_bri
  2  before insert on mytable
  3  for each row
  4  begin
  5    if :new.id = 2
  6    then
  7      raise_application_error(-20000,'APP-12345');
  8    elsif :new.id = 9
  9    then
 10      raise_application_error(-20000,'APP-98765');
 11    end if;
 12  end;
 13  /

Trigger is aangemaakt.

rwijk@ORA11G> alter table mytable add constraint mytable_ck1 check (id <> 6)
  2  /

Tabel is gewijzigd.

rwijk@ORA11G> declare
  2    e_forall_error exception;
  3    pragma exception_init(e_forall_error,-24381)
  4    ;
  5    type t_numbers is table of mytable.id%type;
  6    l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
  7    ;
  8  begin
  9    forall i in 1..l_numbers.count save exceptions
 10      insert into mytable
 11      ( id
 12      , name
 13      )
 14      values
 15      ( l_numbers(i)
 16      , 'Name' || to_char(l_numbers(i))
 17      )
 18    ;
 19  exception
 20  when e_forall_error then
 21    for i in 1..sql%bulk_exceptions.count
 22    loop
 23      dbms_output.put_line('SQLCODE: ' || sql%bulk_exceptions(i).error_code);
 24      dbms_output.put_line('SQLERRM: ' || sqlerrm(-sql%bulk_exceptions(i).error_code));
 25      dbms_output.new_line;
 26    end loop;
 27  end;
 28  /
SQLCODE: 20000
SQLERRM: ORA-20000:

SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (.) is geschonden.

SQLCODE: 20000
SQLERRM: ORA-20000:


PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
  2  /

        ID NAME
---------- ------------------------------
         1 Name1
         3 Name3
         4 Name4
         5 Name5
         7 Name7
         8 Name8
        10 Name10

7 rijen zijn geselecteerd.


Note how the SQLERRM message doesn't return anything useful and that the name of the check constraint has disappeared. This is really annoying and can't be circumvented easily in 9i. For better error messages we would have to go back to row by row processing. And that means: very slow.

However, version 10gR2 introduced a feature called DML error logging. I remember reading about it more than two years ago here on Tom Kyte's blog. In this entry and in the documentation you only see examples using SQL, not PL/SQL examples using FORALL. But luckily this works as well:

rwijk@ORA11G> rollback
  2  /

Rollback is voltooid.

rwijk@ORA11G> exec dbms_errlog.create_error_log('mytable')

PL/SQL-procedure is geslaagd.

rwijk@ORA11G> declare
  2    type t_numbers is table of mytable.id%type;
  3    l_numbers t_numbers := t_numbers(1,2,3,4,5,6,7,8,9,10)
  4    ;
  5  begin
  6    forall i in 1..l_numbers.count
  7      insert into mytable
  8      ( id
  9      , name
 10      )
 11      values
 12      ( l_numbers(i)
 13      , 'Name' || to_char(l_numbers(i))
 14      )
 15      log errors reject limit unlimited
 16    ;
 17    for r in
 18    ( select ora_err_number$
 19           , ora_err_mesg$
 20        from err$_mytable
 21    )
 22    loop
 23      dbms_output.put_line('SQLCODE: ' || to_char(r.ora_err_number$));
 24      dbms_output.put_line('SQLERRM: ' || r.ora_err_mesg$);
 25      dbms_output.new_line;
 26    end loop
 27    ;
 28  end;
 29  /
SQLCODE: 20000
SQLERRM: ORA-20000: APP-12345
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 4
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.


SQLCODE: 2290
SQLERRM: ORA-02290: CHECK-beperking (RWIJK.MYTABLE_CK1) is geschonden.


SQLCODE: 20000
SQLERRM: ORA-20000: APP-98765
ORA-06512: in "RWIJK.MYTABLE_BRI", regel 7
ORA-04088: Fout bij uitvoering van trigger 'RWIJK.MYTABLE_BRI'
.



PL/SQL-procedure is geslaagd.

rwijk@ORA11G> select id, name from mytable
  2  /

        ID NAME
---------- ------------------------------
         1 Name1
         3 Name3
         4 Name4
         5 Name5
         7 Name7
         8 Name8
        10 Name10

7 rijen zijn geselecteerd.


And you do get to see the error messages (APP-12345 and APP-98765) and the name of the check constraint. Unfortunately, our shop still uses 9.2.0.7...



UPDATE



It is worth knowing that there are some restrictions when working with the DML error logging clause. From the 11g documentation (same as in 10gR2 documentation):

Restrictions on DML Error Logging

* The following conditions cause the statement to fail and roll back without invoking the error logging capability:

o Violated deferred constraints.

o Any direct-path INSERT or MERGE operation that raises a unique constraint or index violation.

o Any update operation UPDATE or MERGE that raises a unique constraint or index violation.

* You cannot track errors in the error logging table for LONG, LOB, or object type columns. However, the table that is the target of the DML operation can contain these types of columns.

o If you create or modify the corresponding error logging table so that it contains a column of an unsupported type, and if the name of that column corresponds to an unsupported column in the target DML table, then the DML statement fails at parse time.

o If the error logging table does not contain any unsupported column types, then all DML errors are logged until the reject limit of errors is reached. For rows on which errors occur, column values with corresponding columns in the error logging table are logged along with the control information.

Tuesday 18 September 2012

SQL Queries

Create the following Tables:

LOCATION
Location_ID
Regional_Group
122
NEW YORK
123
DALLAS
124
CHICAGO
167
BOSTON

DEPARTMENT
Department_ID
Name
Location_ID
10
ACCOUNTING
122
20
RESEARCH
124
30
SALES
123
40
OPERATIONS
167


JOB
Job_ID
Function
667
CLERK
668
STAFF
669
ANALYST
670
SALESPERSON
671
MANAGER
672
PRESIDENT



EMPLOYEE

EMPLOYEE_ID
LAST_NAME
FIRST_NAME
MIDDLE_NAME
JOB_ID
MANAGER_ID
HIREDATE
SALARY
COMM
DEPARTMENT_ID
7369
SMITH
JOHN
Q
667
7902
17-DEC-84
800
NULL
20
7499
ALLEN
KEVIN
J
670
7698
20-FEB-85
1600
300
30
7505
DOYLE
JEAN
K
671
7839
04-APR-85
2850
NULL
30
7506
DENNIS
LYNN
S
671
7839
15-MAY-85
2750
NULL
30
7507
BAKER
LESLIE
D
671
7839
10-JUN-85
2200
NULL
40
7521
WARK
CYNTHIA
D
670
7698
22-FEB-85
1250
500
30


Queries based on the above tables:

Simple Queries:

  1. List all the employee details
  2. List all the department details
  3. List all job details
  4. List all the locations
  5. List out first name,last name,salary, commission for all employees
  6. List out employee_id,last name,department id for all  employees and rename employee id as "ID  of the employee", last name as "Name of the employee", department id as  "department  ID"
  7. List out the employees anuual salary with their names only.


Where Conditions:

  1. List the details about "SMITH"
  2. List out the employees who are working in department 20
  3. List out the employees who are earning salary between 3000 and 4500
  4. List out the employees who are working in department 10 or 20
  5. Find out the employees who are not working in department 10 or 30
  6. List out the employees whose name starts with "S"
  7. List out the employees whose name start with "S" and end with "H"
  8. List out the employees whose name length is 4 and start with "S"
  9. List out the employees who are working in department 10 and draw the salaries more than 3500
  10. list out the employees who are not receiving commission.

Order By Clause:

  1. List out the employee id, last name in ascending order based on the employee id.
  2. List out the employee id, name in descending order based on salary column
  3. list out the employee details according to their last_name in ascending order and salaries in descending order
  4. list out the employee details according to their last_name in ascending order and then on department_id in descending order.


Group By & Having Clause:

  1. How many employees who are working in different departments wise in the organization
  2. List out the department wise maximum salary, minimum salary, average salary of the employees
  3. List out the job wise maximum salary, minimum salary, average salaries of the employees.
  4. List out the no.of employees joined in every month in ascending order.
  5. List out the no.of employees for each month and year, in the ascending order based on the year, month.
  6. List out the department id having atleast four employees.
  7. How many employees in January month.
  8. How many employees who are joined in January or September month.
  9. How many employees who are joined in 1985.
  10. How many employees joined each month in 1985.
  11. How many employees who are joined in March 1985.
  12. Which is the department id, having greater than or equal to 3 employees joined in April 1985.


Sub-Queries

  1. Display the employee who got the maximum salary.
  2. Display the employees who are working in Sales department
  3. Display the employees who are working as "Clerk".
  4. Display the employees who are working in "New York"
  5. Find out no.of employees working in "Sales" department.
  6. Update the employees salaries, who are working as Clerk on the basis of 10%.
  7. Delete the employees who are working in accounting department.
  8. Display the second highest salary drawing employee details.
  9. Display the Nth highest salary drawing employee details


Sub-Query operators: (ALL,ANY,SOME,EXISTS)

  1. List out the employees who earn more than every employee in department 30.
  2. List out the employees who earn more than the lowest salary in department 30.
  3. Find out whose department has not employees.
  4. Find out which department does not have any employees.


Co-Related Sub Queries:

47.Find out the employees who earn greater than the average salary for their department.

Joins

Simple join

48.List our employees with their department names
49.Display employees with their designations (jobs)
50.Display the employees with their department name and regional groups.
51.How many employees who are working in different departments and display with department name.
52.How many employees who are working in sales department.
53.Which is the department having greater than or equal to 5 employees and display the department names in ascending order.
54.How many jobs in the organization with designations.
55.How many employees working in "New York".

Non – Equi Join:

56.Display employee details with salary grades.
57.List out the no. of employees on grade wise.
58.Display the employ salary grades and no. of employees between 2000 to 5000 range of salary.


Self Join:

59.Display the employee details with their manager names.
60.Display the employee details who earn more than their managers salaries.
61.Show the no. of employees working under every manager.

Outer Join:

61.Display employee details with all departments.
62.Display all employees in sales or operation departments.


Set Operators:

63.List out the distinct jobs in Sales and Accounting Departments.
64.List out the ALL jobs in Sales and Accounting Departments.
65.List out the common jobs in Research and Accounting Departments in ascending order.



Answers

  1. SQL > Select * from employee;
  2. SQL > Select * from department;
  3. SQL > Select * from job;
  4. SQL > Select * from loc;
  5. SQL > Select first_name, last_name, salary, commission from employee;
  6. SQL > Select employee_id "id of the employee", last_name "name", department id as "department id" from employee;
  7. SQL > Select last_name, salary*12 "annual salary" from employee
  8. SQL > Select * from employee where last_name='SMITH';
  9. SQL > Select * from employee where department_id=20
  10. SQL > Select * from employee where salary between 3000 and 4500
  11. SQL > Select * from employee where department_id in (20,30)
  12. SQL > Select last_name, salary, commission, department_id from employee where department_id not in (10,30)
  13. SQL > Select * from employee where last_name like 'S%'
  14. SQL > Select * from employee where last_name like 'S%H'
  15. SQL > Select * from employee where last_name like 'S___'
  16. SQL > Select * from employee where department_id=10 and salary>3500
  17. SQL > Select * from employee where commission is Null
  18. SQL > Select employee_id, last_name from employee order by employee_id
  19. SQL > Select employee_id, last_name, salary from employee order by salary desc
  20. SQL > Select employee_id, last_name, salary from employee order by last_name, salary desc
  21. SQL > Select employee_id, last_name, salary from employee order by last_name, department_id desc
  22. SQL > Select department_id, count(*), from employee group by department_id
  23. SQL > Select department_id, count(*), max(salary), min(salary), avg(salary) from employee group by department_id
  24. SQL > Select job_id, count(*), max(salary), min(salary), avg(salary) from employee group by job_id
  25. SQL > Select to_char(hire_date,'month')month, count(*) from employee group by to_char(hire_date,'month') order by month
  26. SQL > Select to_char(hire_date,'yyyy') Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee group by to_char(hire_date,'yyyy'), to_char(hire_date,'mon')
  27. SQL > Select department_id, count(*) from employee group by department_id having count(*)>=4
  28. SQL > Select to_char(hire_date,'mon') month, count(*) from employee group by to_char(hire_date,'mon') having to_char(hire_date,'mon')='jan'
  29. SQL > Select to_char(hire_date,'mon') month, count(*) from employee group by to_char(hire_date,'mon') having to_char(hire_date,'mon') in ('jan','sep')
  30. SQL > Select to_char(hire_date,'yyyy') Year, count(*) from employee group by to_char(hire_date,'yyyy') having to_char(hire_date,'yyyy')=1985
  31. SQL > Select to_char(hire_date,'yyyy')Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 group by to_char(hire_date,'yyyy'),to_char(hire_date,'mon')
  32. SQL > Select to_char(hire_date,'yyyy')Year, to_char(hire_date,'mon') Month, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 and to_char(hire_date,'mon')='mar' group by to_char(hire_date,'yyyy'),to_char(hire_date,'mon')
  33. SQL > Select department_id, count(*) "No. of employees" from employee where to_char(hire_date,'yyyy')=1985 and to_char(hire_date,'mon')='apr' group by to_char(hire_date,'yyyy'), to_char(hire_date,'mon'), department_id having count(*)>=3
  34. SQL > Select * from employee where salary=(select max(salary) from employee)
  35. SQL > Select * from employee where department_id IN (select department_id from department where name='SALES')
  36. SQL > Select * from employee where job_id in (select job_id from job where function='CLERK'
  37. SQL > Select * from employee where department_id=(select department_id from department where location_id=(select location_id from location where regional_group='New York'))
  38. SQL > Select * from employee where department_id=(select department_id from department where name='SALES' group by department_id)
  39. SQL > Update employee set salary=salary*10/100 wehre job_id=(select job_id from job where function='CLERK')
  40. SQL > delete from employee where department_id=(select department_id from department where name='ACCOUNTING')
  41. SQL > Select * from employee where salary=(select max(salary) from employee where salary <(select max(salary) from employee))
  42. SQL > Select distinct e.salary from employee where & no-1=(select count(distinct salary) from employee where sal>e.salary)
  43. SQL > Select * from employee where salary > all (Select salary from employee where department_id=30)
  44. SQL > Select * from employee where salary > any (Select salary from employee where department_id=30)
  45. SQL > Select employee_id, last_name, department_id from employee e where not exists (select department_id from department d where d.department_id=e.department_id)
  46. SQL > Select name from department d where not exists (select last_name from employee e where d.department_id=e.department_id)
  47. SQL > Select employee_id, last_name, salary, department_id from employee e where salary > (select avg(salary) from employee where department_id=e.department_id)
  48. SQL > Select employee_id, last_name, name from employee e, department d where e.department_id=d.department_id
  49. SQL > Select employee_id, last_name, function from employee e, job j where e.job_id=j.job_id
  50. SQL > Select employee_id, last_name, name, regional_group from employee e, department d, location l where e.department_id=d.department_id and d.location_id=l.location_id
  51. SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name
  52. SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name having name='SALES'
  53. SQL > Select name, count(*) from employee e, department d where d.department_id=e.department_id group by name having count (*)>=5 order by name
  54. SQL > Select function, count(*) from employee e, job j where j.job_id=e.job_id group by function
  55. SQL > Select regional_group, count(*) from employee e, department d, location l where e.department_id=d.department_id and d.location_id=l.location_id and regional_group='NEW YORK' group by regional_group
  56. SQL > Select employee_id, last_name, grade_id from employee e, salary_grade s where salary between lower_bound and upper_bound order by last_name
  57. SQL > Select grade_id, count(*) from employee e, salary_grade s where salary between lower_bound and upper_bound group by grade_id order by grade_id desc
  58. SQL > Select grade_id, count(*) from employee e, salary_grade s where salary between lower_bound and upper_bound and lower_bound>=2000 and lower_bound<=5000 group by grade_id order by grade_id desc
  59. SQL > Select e.last_name emp_name, m.last_name, mgr_name from employee e, employee m where e.manager_id=m.employee_id
  60. SQL > Select e.last_name emp_name, e.salary emp_salary, m.last_name, mgr_name, m.salary mgr_salary from employee e, employee m where e.manager_id=m.employee_id and m.salary
  61. SQL > Select m.manager_id, count(*) from employee e, employee m where e.employee_id=m.manager_id group by m.manager_id
  62. SQL > Select last_name, d.department_id, d.name from employee e, department d where e.department_id(+)=d.department_id
  63. SQL > Select last_name, d.department_id, d.name from employee e, department d where e.department_id(+)=d.department_id and d.department_idin (select department_id from department where name IN ('SALES','OPERATIONS'))
  64. SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='SALES')) union Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING'))
  65. SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='SALES')) union all Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING'))
  66. SQL > Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='RESEARCH')) intersect Select function from job where job_id in (Select job_id from employee where department_id=(select department_id from department where name='ACCOUNTING')) order by function

sql-and-plsql-interview-questions

 Keep clause

You may have seen an aggregate function like this in SQL queries:
max(value) keep (dense_rank first order by mydate)
or this analytic variant:
max(value) keep (dense_rank last order by mydate) over (partition by relation_nr)
Unfortunately, when you start searching for the "keep" clause, you won't find anything in the Oracle documentation (and hopefully because of this blogpost, people will now have a reference). Of course Oracle documents such functions. You only have to know that they are called FIRST and LAST in the SQL Language Reference.

Even though these functions were already introduced in version 9, I've seen lots of code that could have used these functions, but didn't. And that's a pity because it's a wasted opportunity to write shorter and faster code. The common use case I'm talking about is when you have a detail table with a validity period. Typically with a column startdate, and optionally an enddate. For such a table, you often have to know the values of the currently valid row. An example: suppose we have a table RELATIONS and for each relation we want to know his address at a certain point in time:
01SQL> create table relations
02  2  ( id   number       not null primary key
03  3  , name varchar2(30) not null
04  4  )
05  5  /
06 
07Table created.
08 
09SQL> insert into relations
10  select 1, 'Oracle Nederland' from dual union all
11  select 2, 'Ciber Nederland' from dual
12  4  /
13 
142 rows created.
15 
16SQL> create table relation_addresses
17  2  ( relation_id number       not null
18  3  , startdate   date         not null
19  4  , address     varchar2(30) not null
20  5  , postal_code varchar2(6)  not null
21  6  , city        varchar2(30) not null
22  7  , constraint ra_pk primary key (relation_id,startdate)
23  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)
24  9  )
25 10  /
26 
27Table created.
28 
29SQL> insert into relation_addresses
30  select 1, date '1995-01-01', 'Rijnzathe 6', '3454PV', 'De Meern' from dual union all
31  select 1, date '2011-01-01', 'Hertogswetering 163-167', '3543AS', 'Utrecht' from dual union all
32  select 2, date '2000-01-01', 'Frankrijkstraat 128', '5622AH', 'Eindhoven' from dual union all
33  select 2, date '2006-01-01', 'Meerkollaan 15', '5613BS', 'Eindhoven' from dual union all
34  select 2, date '2010-01-01', 'Burgemeester Burgerslaan 40b', '5245NH', 'Den Bosch' from dual union all
35  select 2, date '2015-01-01', 'Archimedesbaan 16', '3439ME', 'Nieuwegein' from dual
36  8  /
37 
386 rows created.
39 
40SQL> begin
41  2    dbms_stats.gather_table_stats(user,'relations');
42  3    dbms_stats.gather_table_stats(user,'relation_addresses');
43  end;
44  5  /
45 
46PL/SQL procedure successfully completed.
Relation "Oracle Nederland" has two addresses, and its current address being at the Hertogswetering. And fictively, relation "Ciber Nederland" has four addresses. The current address is the Den Bosch one. And I've also recorded a future address in Nieuwegein. Note that, in real life, the latter three are all Ciber offices currently in use. To get the active relation addresses on October 1st, 2012, I can use this query:
01SQL> var REFERENCE_DATE varchar2(10)
02SQL> exec :REFERENCE_DATE:='2012-10-01'
03 
04PL/SQL procedure successfully completed.
05 
06SQL> select ra.relation_id
07  2       , max(ra.startdate) startdate
08  3    from relation_addresses ra
09  4   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
10  5   group by ra.relation_id
11  6  /
12 
13RELATION_ID STARTDATE
14----------- -------------------
15          1 01-01-2011 00:00:00
16          2 01-01-2010 00:00:00
17 
182 rows selected.
But what if I want to retrieve the current address belonging to these rows? In fact, this is frequently being asked in Oracle forums. Prior to Oracle8, you would have used a query like below:
01SQL> select ra.relation_id
02  2       , ra.startdate
03  3       , ra.address
04  4       , ra.postal_code
05  5       , ra.city
06  6    from relation_addresses ra
07  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
08  8     and not exists
09  9         ( select 'a relation_address with a more recent startdate'
10 10             from relation_addresses ra2
11 11            where ra2.relation_id = ra.relation_id
12 12              and ra2.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
13 13              and ra2.startdate > ra.startdate
14 14         )
15 15  /
16 
17RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
18----------- ------------------- ------------------------------ ------ ------------------------------
19          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
20          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch
21 
222 rows selected.
This uses a correlated subquery accessing the table (or index belonging to) table RELATION_ADDRESSES twice. Which can be prevented from Oracle8 onwards by using an analytic function:
01SQL> select relation_id
02  2       , startdate
03  3       , address
04  4       , postal_code
05  5       , city
06  6    from ( select ra.relation_id
07  7                , ra.startdate
08  8                , ra.address
09  9                , ra.postal_code
10 10                , ra.city
11 11                , row_number() over (partition by ra.relation_id order by ra.startdate desc) rn
12 12             from relation_addresses ra
13 13            where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
14 14         )
15 15   where rn = 1
16 16  /
17 
18RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
19----------- ------------------- ------------------------------ ------ ------------------------------
20          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
21          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch
22 
232 rows selected.
Here you compute the row_number when you partition the result set per relation_id ordered by startdate in descending order. Meaning the most recent date starting before the reference date, gets row_number 1 assigned per relation_id. By using an inline view, we can filter on the outcome of the analytic function, and only select the rows with row_number 1. In forums, you'll see this solution often being adviced. Compared to the correlated subquery, this query selects only once from table RELATION_ADDRESSES. However, you can do even better by just adding three "keep clause" functions to the original query:
01SQL> select ra.relation_id
02  2       , max(ra.startdate) startdate
03  3       , max(ra.address) keep (dense_rank last order by ra.startdate) address
04  4       , max(ra.postal_code) keep (dense_rank last order by ra.startdate) postal_code
05  5       , max(ra.city) keep (dense_rank last order by ra.startdate) city
06  6    from relation_addresses ra
07  7   where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
08  8   group by ra.relation_id
09  9  /
10 
11RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
12----------- ------------------- ------------------------------ ------ ------------------------------
13          1 01-01-2011 00:00:00 Hertogswetering 163-167        3543AS Utrecht
14          2 01-01-2010 00:00:00 Burgemeester Burgerslaan 40b   5245NH Den Bosch
15 
162 rows selected.
The three extra aggregate functions all do a "dense_rank last order by startdate", meaning "sort the rows by startdate, and pick only those rows which have the most recent startdate". If you have more rows with the same startdate, the max function at the start tells Oracle to pick the value with the maximum address/postal_code/city. However, (relation_id,startdate) is unique, so ties are impossible and thus the max function is a dummy. I also could have used min.

The query is shorter and -to me- clearer at first glance. However, the main reason for my enthusiasm for the aggregate functions FIRST and LAST is because it's just faster. To show this, let's execute those queries against a table with 300,000 rows, 100,000 relations with 3 addresses each:
01SQL> create table relations
02  2  ( id   number       not null primary key
03  3  , name varchar2(30) not null
04  4  )
05  5  /
06 
07Table created.
08 
09SQL> create table relation_addresses
10  2  ( relation_id number       not null
11  3  , startdate   date         not null
12  4  , address     varchar2(30) not null
13  5  , postal_code varchar2(6)  not null
14  6  , city        varchar2(30) not null
15  7  , constraint ra_pk primary key (relation_id,startdate)
16  8  , constraint ra_r_fk foreign key (relation_id) references relations(id)
17  9  )
18 10  /
19 
20Table created.
21 
22SQL> insert into relations
23  2   select level
24  3        , dbms_random.string('a',30)
25  4     from dual
26  connect by level <= 100000
27  6  /
28 
29100000 rows created.
30 
31SQL> insert into relation_addresses
32  2   select 1 + mod(level-1,100000)
33  3        , date '2013-01-01' - numtodsinterval(level,'hour')
34  4        , dbms_random.string('a',30)
35  5        , dbms_random.string('a',6)
36  6        , dbms_random.string('a',30)
37  7     from dual
38  connect by level <= 300000
39  9  /
40 
41300000 rows created.
42 
43SQL> begin
44  2    dbms_stats.gather_table_stats
45  3    ( user
46  4    , 'relations'
47  5    , cascade=>true
48  6    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'
49  7    , estimate_percent=>100
50  8    );
51  9    dbms_stats.gather_table_stats
52 10    ( user
53 11    , 'relation_addresses'
54 12    , cascade=>true
55 13    , method_opt=>'FOR ALL INDEXED COLUMNS SIZE 254'
56 14    , estimate_percent=>100
57 15    );
58 16  end;
59 17  /
60 
61PL/SQL procedure successfully completed.
Note that I created histograms with 254 buckets just to make the optimizer see that it should full scan the table, despite the "startdate <= :REFERENCE_DATE" predicate. This next query should give a clue what's in the table:
01SQL> select *
02  2    from relation_addresses
03  3   where relation_id in (1,2,99999,100000)
04  4   order by relation_id
05  5       , startdate
06  6  /
07 
08RELATION_ID STARTDATE           ADDRESS                        POSTAL CITY
09----------- ------------------- ------------------------------ ------ ------------------------------
10          1 09-03-1990 15:00:00 tKgXePxuAIdhFBNJLIRRjodrlJzGOl vPIAbL pNkbFHTJPrVuDIYLxsCfUfetBsKJIE
11          1 05-08-2001 07:00:00 LybVzfpzoQzXjpCAdkSZrkYrwUtZtL cWJwFe IczTRyjITWCJIOErccfITVvsqRVyMF
12          1 31-12-2012 23:00:00 lNEwsdYhbwdqRxHTSCTCykgICxiXKL oXzHQF YfyKFmiboCWfmNLjVLZoKmUDoMFaDu
13          2 09-03-1990 14:00:00 svOylQPkbyfympSXRMeyudfFErFvlO MLFdpG LTtAKdrpUmCwFgqEmoKxnUtWecwgcV
14          2 05-08-2001 06:00:00 BsRCUviBiLHaAEjyRVnIedRAWzuVSe DlBlZW ErQmCkDgNDTMOdZzceFYrMXnZmmjxg
15          2 31-12-2012 22:00:00 wqdFdXoBdmmCooLtGfWOMKukIMrDlI geRRHz DaPpWHOOdWgbjLaRkxfFDUIPgVgvEt
16      99999 12-10-1978 01:00:00 FsXOjUdNIgjjGjnWpJjTTscbcuqsxa PdhVtm qOskmLwRlngSEihmlpYhmNHhvtrpBc
17      99999 09-03-1990 17:00:00 sqoKYNeDntZtAUSmSDMtIQZloTSVeD uGPszi GIDctptEomcGzYGYhUGhKHgDRZJCmY
18      99999 05-08-2001 09:00:00 fhHGwuGPIHSOaKdjDvDcqTzsbHZzqR tpaLAP rVYCmijzqJmhlnZZLXkHpgFmLAEiTS
19     100000 12-10-1978 00:00:00 WwxfHcVfkFfItgcXfjPnKTiATlHjao nSOjSn vZNRsRySNPlmQKgCJjcpiEOhQIxzoy
20     100000 09-03-1990 16:00:00 cGcVPMsFyxCBrnsZtMYBnaAflXiNff NVKRIr SseFWkWyUDgaPpbxdmENdLjurGbJPK
21     100000 05-08-2001 08:00:00 dRfCmqdmbhcmaMvyYBpewPsFBCVdlG BMQWLY YPaAGnKKUkfdnAeAyLYeUBfXwezsEo
22 
2312 rows selected.
So there are a couple of rows that are filtered because they're in the future, but for most rows, the latest row is the current one. This is the plan of the first query with the correlated subquery:
01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
02  2  /
03 
04PLAN_TABLE_OUTPUT
05---------------------------------------------------------------------------------------------------------------------------------------
06SQL_ID  d6p5uh67h65yb, child number 0
07-------------------------------------
08select ra.relation_id      , ra.startdate      , ra.address      ,
09ra.postal_code      , ra.city   from relation_addresses ra  where
10ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')    and not exists
11       ( select 'a relation_address with a more recent startdate'
12     from relation_addresses ra2           where ra2.relation_id =
13ra.relation_id             and ra2.startdate <=
14to_date(:REFERENCE_DATE,'yyyy-mm-dd')             and ra2.startdate >
15ra.startdate        )
16 
17Plan hash value: 3749094337
18 
19---------------------------------------------------------------------------------------------------------------
20| Id  | Operation             | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
21---------------------------------------------------------------------------------------------------------------
22|   0 | SELECT STATEMENT      |                    |      1 |        |    100K|00:00:00.66 |   15071 |   3681 |
23|*  1 |  HASH JOIN RIGHT ANTI |                    |      1 |   2978 |    100K|00:00:00.66 |   15071 |   3681 |
24|*  2 |   INDEX FAST FULL SCAN| RA_PK              |      1 |    297K|    297K|00:00:00.05 |    1240 |     35 |
25|*  3 |   TABLE ACCESS FULL   | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.12 |   13831 |   3646 |
26---------------------------------------------------------------------------------------------------------------
27 
28Predicate Information (identified by operation id):
29---------------------------------------------------
30 
31   1 - access("RA2"."RELATION_ID"="RA"."RELATION_ID")
32       filter("RA2"."STARTDATE">"RA"."STARTDATE")
33   2 - filter("RA2"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))
34   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))
35 
36 
3730 rows selected.
A HASH JOIN ANTI for the not exists, and a total of .66 seconds. Next, the plan for the query with the analytic row_number function:
01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
02  2  /
03 
04PLAN_TABLE_OUTPUT
05---------------------------------------------------------------------------------------------------------------------------------------
06SQL_ID  1zd4wqtxkc2vz, child number 0
07-------------------------------------
08select relation_id      , startdate      , address      , postal_code
09   , city   from ( select ra.relation_id               , ra.startdate
10            , ra.address               , ra.postal_code               ,
11ra.city               , row_number() over (partition by ra.relation_id
12order by ra.startdate desc) rn            from relation_addresses ra
13       where ra.startdate <= to_date(:REFERENCE_DATE,'yyyy-mm-dd')
14  where rn = 1
15 
16Plan hash value: 2795878473
17 
18------------------------------------------------------------------------------------------------------------------
19| Id  | Operation                | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
20------------------------------------------------------------------------------------------------------------------
21|   0 | SELECT STATEMENT         |                    |      1 |        |    100K|00:00:00.97 |    7238 |   3646 |
22|*  1 |  VIEW                    |                    |      1 |    297K|    100K|00:00:00.97 |    7238 |   3646 |
23|*  2 |   WINDOW SORT PUSHED RANK|                    |      1 |    297K|    200K|00:00:00.93 |    7238 |   3646 |
24|*  3 |    TABLE ACCESS FULL     | RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |
25------------------------------------------------------------------------------------------------------------------
26 
27Predicate Information (identified by operation id):
28---------------------------------------------------
29 
30   1 - filter("RN"=1)
31   2 - filter(ROW_NUMBER() OVER ( PARTITION BY "RA"."RELATION_ID" ORDER BY
32              INTERNAL_FUNCTION("RA"."STARTDATE") DESC )<=1)
33   3 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))
34 
35 
3629 rows selected.
Note that this query takes longer than the correlated subquery above: .97 seconds versus .66 seconds. The HASH JOIN ANTI took .49 seconds (.66 - .05 -.12) where computing the ROW_NUMBER took .84 seconds (.93 - .09). So here, on my laptop, I have avoided .05 seconds for the INDEX FAST FULL SCAN, but spend .35 (.84 - .49) seconds more for the computation. Likely, when I/O is more expensive than on my laptop, the time of the first query will go up and the times will be closer to each other. Now the keep clause variant:
01SQL> select * from table(dbms_xplan.display_cursor(null,null,'iostats last'))
02  2  /
03 
04PLAN_TABLE_OUTPUT
05---------------------------------------------------------------------------------------------------------------------------------------
06SQL_ID  dcw8tyyqtu2kk, child number 0
07-------------------------------------
08select ra.relation_id      , max(ra.startdate) startdate      ,
09max(ra.address) keep (dense_rank last order by ra.startdate) address
10  , max(ra.postal_code) keep (dense_rank last order by ra.startdate)
11postal_code      , max(ra.city) keep (dense_rank last order by
12ra.startdate) city   from relation_addresses ra  where ra.startdate <=
13to_date(:REFERENCE_DATE,'yyyy-mm-dd'group by ra.relation_id
14 
15Plan hash value: 2324030966
16 
17------------------------------------------------------------------------------------------------------------
18| Id  | Operation          | Name               | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
19------------------------------------------------------------------------------------------------------------
20|   0 | SELECT STATEMENT   |                    |      1 |        |    100K|00:00:00.55 |    7238 |   3646 |
21|   1 |  SORT GROUP BY     |                    |      1 |    100K|    100K|00:00:00.55 |    7238 |   3646 |
22|*  2 |   TABLE ACCESS FULL| RELATION_ADDRESSES |      1 |    297K|    297K|00:00:00.09 |    7238 |   3646 |
23------------------------------------------------------------------------------------------------------------
24 
25Predicate Information (identified by operation id):
26---------------------------------------------------
27 
28   2 - filter("RA"."STARTDATE"<=TO_DATE(:REFERENCE_DATE,'yyyy-mm-dd'))
29 
30 
3124 rows selected.
The shortest query, the shortest plan and the fastest execution. The SORT GROUP BY immediately reduces the number of intermediate rows from 297K to 100K, whereas the WINDOW SORT PUSHED RANK had to compute the row_number for all 297K rows.
 Much Ado About Nothing?

I was reading this presentation PDF of Hugh Darwen recently, called How To Handle Missing Information Without Using NULL. Several great thinkers and founders of the relational theory consider NULL as the thing that should not be. For example, one slide in the above mentioned PDF is titled SQL's Nulls Are A Disaster. And I found a paper with the amusing title The Final Null In The Coffin.

I can understand the critique. The introduction of NULL leads to three valued logic, which makes programs much more complex and harder to prove correct. All database professionals likely have been bitten by NULLs several times during their career, myself included. And a NULL can have several interpretations. By using NULL, you are not making clear what is meant. If the value for column hair_colour is NULL, does it mean the person is bald? Or do you know the person has hair, but you just don't know what colour? Or can the person be bald or have hair, but you just don't know which one applies? Or is the person in the midst of a hair colouring exercise and you only temporarily don't know the colour? If you're creative, I'm sure you can come up with other interpretations as well.

On the other hand, the theorists don't have to build database applications for end users who like reasonable response times, and I do. Avoiding nulls at all cost typically leads to a data model that has more tables than needed, requiring more joins and therefore making queries slower. So I have to make a trade off. In general I try to avoid nullable columns as much as possible, for example by chosing subtype implementations instead of supertype implementations, and by modelling entity subtypes in the first place, but I will never let it noticeably slow down my application. At my current job, I'm making a data model right now. Having read all use cases, I know how the data will be used and so I know where in the model there is room to avoid an extra nullable column. One thing I'll never voluntarily do though, is make up strange outlier values just to get rid of the null.

Any way, I was curious to see how Hugh Darwen handles missing information without using nulls. In his paper, he has a concise example, which I'll translate to Oracle syntax in this blogpost to see what practically needs to happen to avoid nulls in his example. He starts with this table:

01SQL> select *
02  2    from pers_info
03  3  /
04 
05        ID NAME       JOB            SALARY
06---------- ---------- ---------- ----------
07      1234 Anne       Lawyer         100000
08      1235 Boris      Banker
09      1236 Cindy                      70000
10      1237 Davinder
11 
124 rows selected.
Which contains four NULL values. The meaning of those NULL values can't be seen from this table, but this is what they are meant to be:
  • Boris earns something, but we don't know how much
  • Cindy does some job, but we don't know what it is
  • Davinder doesn't have a job
  • Davinder doesn't have a salary
So he applies a technique called vertical decomposition and on top of those results horizontal decomposition, to arrive at the seven tables below, where everything has a clear meaning.
01SQL> select *
02  2    from called
03  3  /
04 
05        ID NAME
06---------- --------
07      1234 Anne
08      1235 Boris
09      1236 Cindy
10      1237 Davinder
11 
124 rows selected.
13 
14SQL> select *
15  2    from does_job
16  3  /
17 
18        ID JOB
19---------- ------
20      1234 Lawyer
21      1235 Banker
22 
232 rows selected.
24 
25SQL> select *
26  2    from job_unk
27  3  /
28 
29        ID
30----------
31      1236
32 
331 row selected.
34 
35SQL> select *
36  2    from unemployed
37  3  /
38 
39        ID
40----------
41      1237
42 
431 row selected.
44 
45SQL> select *
46  2    from earns
47  3  /
48 
49        ID     SALARY
50---------- ----------
51      1234     100000
52      1236      70000
53 
542 rows selected.
55 
56SQL> select *
57  2    from salary_unk
58  3  /
59 
60        ID
61----------
62      1235
63 
641 row selected.
65 
66SQL> select *
67  2    from unsalaried
68  3  /
69 
70        ID
71----------
72      1237
73 
741 row selected.
Here we achieved a data model where every NULL has been banned out.

Now what if we'd like to simulate a query against the PERS_INFO table? Darwen uses this expression to transform the seven tables back to the PERS_INFO table:
01WITH (EXTEND JOB_UNK ADD ‘Job unknown’ AS Job_info) AS T1,
02     (EXTEND UNEMPLOYED ADD ‘Unemployed’ AS Job_info) AS T2,
03     (DOES_JOB RENAME (Job AS Job_info)) AS T3,
04     (EXTEND SALARY_UNK ADD ‘Salary unknown’ AS Sal_info) AS T4,
05     (EXTEND UNSALARIED ADD ‘Unsalaried’ AS Sal_info) AS T5,
06     (EXTEND EARNS ADD CHAR(Salary) AS Sal_info) AS T6,
07     (T6 { ALL BUT Salary }) AS T7,
08     (UNION ( T1, T2, T3 )) AS T8,
09     (UNION ( T4, T5, T7 )) AS T9,
10     (JOIN ( CALLED, T8, T9 )) AS PERS_INFO :
11PERS_INFO
Translated to Oracle syntax, this becomes:
01SQL> with t1 as
02  2  ( select id
03  3         , 'Job unknown' as job_info
04  4      from job_unk
05  5  )
06  6  , t2 as
07  7  ( select id
08  8         , 'Unemployed' as job_info
09  9      from unemployed
10 10  )
11 11  , t3 as
12 12  ( select id
13 13         , job as job_info
14 14      from does_job
15 15  )
16 16  , t4 as
17 17  ( select id
18 18         , 'Salary unknown' as sal_info
19 19      from salary_unk
20 20  )
21 21  , t5 as
22 22  ( select id
23 23         , 'Unsalaried' as sal_info
24 24      from unsalaried
25 25  )
26 26  , t6 as
27 27  ( select id
28 28         , salary
29 29         , to_char(salary,'fm999G999') as sal_info
30 30      from earns
31 31  )
32 32  , t7 as
33 33  ( select id
34 34         , sal_info
35 35      from t6
36 36  )
37 37  , t8 as
38 38  ( select id
39 39         , job_info
40 40      from t1
41 41     union all
42 42    select id
43 43         , job_info
44 44      from t2
45 45     union all
46 46    select id
47 47         , job_info
48 48      from t3
49 49  )
50 50  , t9 as
51 51  ( select id
52 52         , sal_info
53 53      from t4
54 54     union all
55 55    select id
56 56         , sal_info
57 57      from t5
58 58     union all
59 59    select id
60 60         , sal_info
61 61      from t7
62 62  )
63 63  , pers_info as
64 64  ( select c.id
65 65         , c.name
66 66         , j.job_info
67 67         , s.sal_info
68 68      from called c
69 69           inner join t8 j on (c.id = j.id)
70 70           inner join t9 s on (c.id = s.id)
71 71  )
72 72  select *
73 73    from pers_info
74 74  /
75 
76        ID NAME     JOB_INFO    SAL_INFO
77---------- -------- ----------- --------------
78      1235 Boris    Banker      Salary unknown
79      1237 Davinder Unemployed  Unsalaried
80      1234 Anne     Lawyer      100,000
81      1236 Cindy    Job unknown 70,000
82 
834 rows selected.
Very elaborate, but the optimizer does a great job at simplifying the query under the covers, as can be seen in this execution plan:
01SQL> select *
02  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
03  3  /
04 
05PLAN_TABLE_OUTPUT
06---------------------------------------------------------------------------------------------------------------------------------------
07SQL_ID  bmrtdy0jad18p, child number 0
08-------------------------------------
09with t1 as ( select id        , 'Job unknown' as job_info     from
10job_unk ) , t2 as ( select id        , 'Unemployed' as job_info
11from unemployed ) , t3 as ( select id        , job as job_info     from
12does_job ) , t4 as ( select id        , 'Salary unknown' as sal_info
13 from salary_unk ) , t5 as ( select id        , 'Unsalaried' as
14sal_info     from unsalaried ) , t6 as ( select id        , salary
15  , to_char(salary,'fm999G999') as sal_info     from earns ) , t7 as (
16select id        , sal_info     from t6 ) , t8 as ( select id        ,
17job_info     from t1    union all   select id        , job_info
18from t2    union all   select id        , job_info     from t3 ) , t9
19as ( select id        , sal_info     from t4    union all   select id
20     , sal_info     from t5    union all   select id        , sal_info
21   from t7 ) , pers_info as ( select c.id        , c.name        ,
22j.job_info        , s.sal_info     from called c          inner join t8
23j on (c.id = j.id)
24 
25Plan hash value: 583520090
26 
27-------------------------------------------------------------------------------------------------------------------------
28| Id  | Operation             | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
29-------------------------------------------------------------------------------------------------------------------------
30|   0 | SELECT STATEMENT      |            |      1 |        |      4 |00:00:00.01 |      14 |       |       |          |
31|*  1 |  HASH JOIN            |            |      1 |      4 |      4 |00:00:00.01 |      14 |  1011K|  1011K|  550K (0)|
32|*  2 |   HASH JOIN           |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1180K|  1180K|  548K (0)|
33|   3 |    TABLE ACCESS FULL  | CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |
34|   4 |    VIEW               |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |
35|   5 |     UNION-ALL         |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |
36|   6 |      TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
37|   7 |      TABLE ACCESS FULL| UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
38|   8 |      TABLE ACCESS FULL| DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
39|   9 |   VIEW                |            |      1 |      4 |      4 |00:00:00.01 |       6 |       |       |          |
40|  10 |    UNION-ALL          |            |      1 |        |      4 |00:00:00.01 |       6 |       |       |          |
41|  11 |     TABLE ACCESS FULL | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
42|  12 |     TABLE ACCESS FULL | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
43|  13 |     TABLE ACCESS FULL | EARNS      |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
44-------------------------------------------------------------------------------------------------------------------------
45 
46Predicate Information (identified by operation id):
47---------------------------------------------------
48 
49   1 - access("C"."ID"="S"."ID")
50   2 - access("C"."ID"="J"."ID")
51 
52 
5345 rows selected.
If I had to build the PERS_INFO table back again with a query myself, I'd use this shorter query with six left outer joins:
01SQL> select c.id
02  2       , c.name
03  3       , coalesce(j.job,nvl2(ju.id,'Job unknown',null),nvl2(ue.id,'Unemployed',null)) job_info
04  4       , coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info
05  5    from called c
06  6         left outer join does_job j on (c.id = j.id)
07  7         left outer join job_unk ju on (c.id = ju.id)
08  8         left outer join unemployed ue on (c.id = ue.id)
09  9         left outer join earns e on (c.id = e.id)
10 10         left outer join salary_unk su on (c.id = su.id)
11 11         left outer join unsalaried us on (c.id = us.id)
12 12  /
13 
14        ID NAME     JOB_INFO    SALARY_INFO
15---------- -------- ----------- --------------
16      1234 Anne     Lawyer      100,000
17      1236 Cindy    Job unknown 70,000
18      1235 Boris    Banker      Salary unknown
19      1237 Davinder Unemployed  Unsalaried
20 
214 rows selected.
Although, as you can see below, the plan doesn't really improve:
01SQL> select *
02  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
03  3  /
04 
05PLAN_TABLE_OUTPUT
06---------------------------------------------------------------------------------------------------------------------------------------
07SQL_ID  6x45b27mvpb1m, child number 0
08-------------------------------------
09select c.id      , c.name      , coalesce(j.job,nvl2(ju.id,'Job
10unknown',null),nvl2(ue.id,'Unemployed',null)) job_info      ,
11coalesce(to_char(e.salary,'fm999G999'),nvl2(su.id,'Salary
12unknown',null),nvl2(us.id,'Unsalaried',null)) salary_info   from called
13c        left outer join does_job j on (c.id = j.id)        left outer
14join job_unk ju on (c.id = ju.id)        left outer join unemployed ue
15on (c.id = ue.id)        left outer join earns e on (c.id = e.id)
16 left outer join salary_unk su on (c.id = su.id)        left outer join
17unsalaried us on (c.id = us.id)
18 
19Plan hash value: 3398518218
20 
21---------------------------------------------------------------------------------------------------------------------------
22| Id  | Operation               | Name       | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
23---------------------------------------------------------------------------------------------------------------------------
24|   0 | SELECT STATEMENT        |            |      1 |        |      4 |00:00:00.01 |      15 |       |       |          |
25|*  1 |  HASH JOIN OUTER        |            |      1 |      4 |      4 |00:00:00.01 |      15 |   955K|   955K|  528K (0)|
26|*  2 |   HASH JOIN OUTER       |            |      1 |      4 |      4 |00:00:00.01 |      12 |  1000K|  1000K|  523K (0)|
27|*  3 |    HASH JOIN OUTER      |            |      1 |      4 |      4 |00:00:00.01 |      10 |  1035K|  1035K|  536K (0)|
28|*  4 |     HASH JOIN OUTER     |            |      1 |      4 |      4 |00:00:00.01 |       8 |  1063K|  1063K|  536K (0)|
29|*  5 |      HASH JOIN OUTER    |            |      1 |      4 |      4 |00:00:00.01 |       6 |  1114K|  1114K|  537K (0)|
30|*  6 |       HASH JOIN OUTER   |            |      1 |      4 |      4 |00:00:00.01 |       4 |  1180K|  1180K|  538K (0)|
31|   7 |        TABLE ACCESS FULL| CALLED     |      1 |      4 |      4 |00:00:00.01 |       2 |       |       |          |
32|   8 |        TABLE ACCESS FULL| JOB_UNK    |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
33|   9 |       TABLE ACCESS FULL | UNEMPLOYED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
34|  10 |      TABLE ACCESS FULL  | SALARY_UNK |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
35|  11 |     TABLE ACCESS FULL   | UNSALARIED |      1 |      1 |      1 |00:00:00.01 |       2 |       |       |          |
36|  12 |    TABLE ACCESS FULL    | DOES_JOB   |      1 |      2 |      2 |00:00:00.01 |       2 |       |       |          |
37|  13 |   TABLE ACCESS FULL     | EARNS      |      1 |      2 |      2 |00:00:00.01 |       3 |       |       |          |
38---------------------------------------------------------------------------------------------------------------------------
39 
40Predicate Information (identified by operation id):
41---------------------------------------------------
42 
43   1 - access("C"."ID"="E"."ID")
44   2 - access("C"."ID"="J"."ID")
45   3 - access("C"."ID"="US"."ID")
46   4 - access("C"."ID"="SU"."ID")
47   5 - access("C"."ID"="UE"."ID")
48   6 - access("C"."ID"="JU"."ID")
49 
50 
5143 rows selected.
But the two plans above are really complex, compared with a simple query against the PERS_INFO table with nullable columns:
01SQL> select *
02  2    from pers_info
03  3  /
04 
05        ID NAME       JOB            SALARY
06---------- ---------- ---------- ----------
07      1234 Anne       Lawyer         100000
08      1235 Boris      Banker
09      1236 Cindy                      70000
10      1237 Davinder
11 
124 rows selected.
13 
14SQL> select *
15  2    from table(dbms_xplan.display_cursor(null,null,'allstats last'))
16  3  /
17 
18PLAN_TABLE_OUTPUT
19---------------------------------------------------------------------------------------------------------------------------------------
20SQL_ID  016x9f106gj27, child number 1
21-------------------------------------
22select *   from pers_info
23 
24Plan hash value: 1584579034
25 
26-----------------------------------------------------------------------------------------
27| Id  | Operation         | Name      | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
28-----------------------------------------------------------------------------------------
29|   0 | SELECT STATEMENT  |           |      1 |        |      4 |00:00:00.01 |       7 |
30|   1 |  TABLE ACCESS FULL| PERS_INFO |      1 |      4 |      4 |00:00:00.01 |       7 |
31-----------------------------------------------------------------------------------------
32 
33 
3413 rows selected.
If queries like this are not very frequent in your database, you might want to take this extra work for granted and avoid the NULL. But you need to consider something else as well: the new schema requires much more constraints. Using just the PERS_INFO table, a single primary key constraint on the Id column is all you need. But for the new model, Darwen describes 9, but really 15 constraints:
  1. No two CALLED rows have the same Id. (Primary key)
  2. Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.
  3. No row in DOES_JOB has a matching row in JOB_UNK.
  4. No row in DOES_JOB has a matching row in UNEMPLOYED.
  5. No row in JOB_UNK has a matching row in UNEMPLOYED.
  6. Every row in DOES_JOB has a matching row in CALLED. (Foreign key)
  7. Every row in JOB_UNK has a matching row in CALLED. (Foreign key)
  8. Every row in UNEMPLOYED has a matching row in CALLED. (Foreign key)
  9. Constraints 2 through 8 repeated, mutatis mutandis, for CALLED with respect to EARNS, SALARY_UNK and UNSALARIED.
Implementing constraint 1 is easy:
1SQL> alter table called add primary key (id)
2  2  /
3 
4Table altered.
And so are constraints 6, 7 and 8:
01SQL>alter table does_job add foreign key (id) references called (id)
02  2  /
03 
04Table altered.
05 
06SQL> alter table job_unk add foreign key (id) references called (id)
07  2  /
08 
09Table altered.
10 
11SQL> alter table unemployed add foreign key (id) references called (id)
12  2  /
13 
14Table altered.
But constraint 2 says that the Id in table CALLED is a foreign distributed key. And constraints 3, 4 and 5 say the Id's of tables DOES_JOB, JOB_UNK and UNEMPLOYED are a distributed key. Oracle doesn't have declarative support for distributed keys or for foreign distributed keys. We could write database trigger code to implement this, which is very hard to do correct or we could use the materialized view trick to have the condition validated at the end of a transaction, instead of at the end of the statement, which also has its downsides. And such deferred constraint checking is explicitly ruled out by The Third Manifesto as well. Nevertheless, here is how it can be done.

The distributed key (constraints 3, 4 and 5):
01SQL> create materialized view log on does_job with rowid
02  2  /
03 
04Materialized view log created.
05 
06SQL> create materialized view log on job_unk with rowid
07  2  /
08 
09Materialized view log created.
10 
11SQL> create materialized view log on unemployed with rowid
12  2  /
13 
14Materialized view log created.
15 
16SQL> create materialized view distributed_key_vw
17  2    refresh fast on commit
18  as
19  select d.rowid rid
20  5       , d.id    id
21  6       , 'D'     umarker
22  7    from does_job d
23  8   union all
24  select j.rowid
25 10       , j.id
26 11       , 'J'
27 12    from job_unk j
28 13   union all
29 14  select u.rowid
30 15       , u.id
31 16       , 'U'
32 17    from unemployed u
33 18  /
34 
35Materialized view created.
36 
37SQL> alter table distributed_key_vw
38  2    add constraint distributed_key_check
39  3    primary key (id)
40  4  /
41 
42Table altered.
And to show that the distributed key implementation works:
01SQL> insert into job_unk values (1234)
02  2  /
03 
041 row created.
05 
06SQL> commit
07  2  /
08commit
09*
10ERROR at line 1:
11ORA-12048: error encountered while refreshing materialized view "RWIJK"."DISTRIBUTED_KEY_VW"
12ORA-00001: unique constraint (RWIJK.DISTRIBUTED_KEY_CHECK) violated
And the foreign distributed key ("Every row in CALLED has a matching row in either DOES_JOB, JOB_UNK, or UNEMPLOYED.") can be implemented like this:
01SQL> create materialized view log on does_job with rowid
02  2  /
03 
04Materialized view log created.
05 
06SQL> create materialized view log on job_unk with rowid
07  2  /
08 
09Materialized view log created.
10 
11SQL> create materialized view log on unemployed with rowid
12  2  /
13 
14Materialized view log created.
15 
16SQL> create materialized view foreign_distributed_key_vw
17  2    refresh fast on commit
18  as
19  select c.rowid  c_rowid
20  5       , dj.rowid dj_rowid
21  6       , ju.rowid ju_rowid
22  7       , ue.rowid ue_rowid
23  8       , c.id     id
24  9       , dj.id    dj_id
25 10       , ju.id    ju_id
26 11       , ue.id    ue_id
27 12    from called c
28 13       , does_job dj
29 14       , job_unk ju
30 15       , unemployed ue
31 16   where c.id = dj.id (+)
32 17     and c.id = ju.id (+)
33 18     and c.id = ue.id (+)
34 19  /
35 
36Materialized view created.
37 
38SQL> alter table foreign_distributed_key_vw
39  2    add constraint foreign_distributed_key_check
40  3    check (coalesce(dj_id,ju_id,ue_id) is not null)
41  4  /
42 
43Table altered.
And some proof that this implementation works:
01SQL> insert into called values (1238,'Elise')
02  2  /
03 
041 row created.
05 
06SQL> commit
07  2  /
08commit
09*
10ERROR at line 1:
11ORA-12008: error in materialized view refresh path
12ORA-02290: check constraint (RWIJK.FOREIGN_DISTRIBUTED_KEY_CHECK) violated
Would I go through the extra trouble of an implementation with 6 more tables, 14 extra constraints and worse performance like above? It depends. It depends on how often the data is queried, and on how often it is updated concurrently. And on whether the distinction between the possible multiple meanings of NULL is relevant in my case. And whether I have sufficient extra time to implement it. Using Oracle, probably most often, I won't.

Connect By Filtering

A hierarchical query is typically executed using a plan that starts with the operation CONNECT BY WITH FILTERING, which has two child operations. The first child operation implements the START WITH clause and the second child operation contains a step called CONNECT BY PUMP, implementing the recursive part of the query. Here is an example of such a plan using the well known hierarchical query on table EMP:

01SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename
02  2        , level
03  3        , job
04  4        , deptno
05  5     from emp
06  connect by mgr = prior empno
07  7    start with mgr is null
08  8  /
09 
10ENAME                     LEVEL JOB                             DEPTNO
11-------------------- ---------- --------------------------- ----------
12KING                          1 PRESIDENT                           10
13  JONES                       2 MANAGER                             20
14    SCOTT                     3 ANALYST                             20
15      ADAMS                   4 CLERK                               20
16    FORD                      3 ANALYST                             20
17      SMITH                   4 CLERK                               20
18  BLAKE                       2 MANAGER                             30
19    ALLEN                     3 SALESMAN                            30
20    WARD                      3 SALESMAN                            30
21    MARTIN                    3 SALESMAN                            30
22    TURNER                    3 SALESMAN                            30
23    JAMES                     3 CLERK                               30
24  CLARK                       2 MANAGER                             10
25    MILLER                    3 CLERK                               10
26 
2714 rows selected.
28 
29SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
30  2  /
31 
32PLAN_TABLE_OUTPUT
33---------------------------------------------------------------------------------------------------------------------------------------
34SQL_ID  d2c7xqxbr112u, child number 0
35-------------------------------------
36 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level       , job       , deptno    from emp connect by
37mgr = prior empno   start with mgr is null
38 
39Plan hash value: 1869448388
40 
41--------------------------------------------------------------------------------------------------------------------------------
42| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |
43--------------------------------------------------------------------------------------------------------------------------------
44|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.02 |      15 |      6 |  2048 |  2048 | 2048  (0)|
45|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |      6 |       |       |          |
46|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |      0 |  1452K|  1452K|  853K (0)|
47|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |      0 |       |       |          |
48|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |      0 |       |       |          |
49--------------------------------------------------------------------------------------------------------------------------------
50 
51Predicate Information (identified by operation id):
52---------------------------------------------------
53 
54   1 - access("MGR"=PRIOR NULL)
55   2 - filter("MGR" IS NULL)
56   3 - access("MGR"=PRIOR NULL)
57 
58 
5924 rows selected.

You can see a great and more detailed explanation of connect by with filtering here on Christian Antognini's blog.

When I was researching the new recursive subquery factoring clause one and a half year ago, and compared a standard hierarchical query on EMP using recursive subquery factoring with a query using the good old connect by, I stumbled upon a new optimizer algorithm for implementing recursive queries:

01SQL>  select lpad(' ', 2 * level - 2, ' ') || ename as ename
02  2        , level
03  3        , job
04  4        , deptno
05  5     from emp
06  connect by mgr = prior empno
07  7    start with mgr is null
08  8  /
09 
10ENAME                     LEVEL JOB           DEPTNO
11-------------------- ---------- --------- ----------
12KING                          1 PRESIDENT         10
13  JONES                       2 MANAGER           20
14    SCOTT                     3 ANALYST           20
15      ADAMS                   4 CLERK             20
16    FORD                      3 ANALYST           20
17      SMITH                   4 CLERK             20
18  BLAKE                       2 MANAGER           30
19    ALLEN                     3 SALESMAN          30
20    WARD                      3 SALESMAN          30
21    MARTIN                    3 SALESMAN          30
22    TURNER                    3 SALESMAN          30
23    JAMES                     3 CLERK             30
24  CLARK                       2 MANAGER           10
25    MILLER                    3 CLERK             10
26 
2714 rows selected.
28 
29SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
30  2  /
31 
32PLAN_TABLE_OUTPUT
33---------------------------------------------------------------------------------------------------------------------------------------
34SQL_ID  d2c7xqxbr112u, child number 0
35-------------------------------------
36 select lpad(' ', 2 * level - 2, ' ') || ename as ename       , level
37    , job       , deptno    from emp connect by mgr = prior empno
38start with mgr is null
39 
40Plan hash value: 763482334
41 
42-------------------------------------------------------------------------------------------------------------------
43| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |
44-------------------------------------------------------------------------------------------------------------------
45|   0 | SELECT STATEMENT                        |      |      1 |        |     14 |00:00:00.02 |       6 |      6 |
46|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.02 |       6 |      6 |
47|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.02 |       6 |      6 |
48-------------------------------------------------------------------------------------------------------------------
49 
50Predicate Information (identified by operation id):
51---------------------------------------------------
52 
53   1 - access("MGR"=PRIOR NULL)
54       filter("MGR" IS NULL)
55 
56 
5722 rows selected.

You might wonder what I did to make two exactly the same queries to use a different execution plan, but I'll address that later. First, I'd like to show there are two optimizer hints available, with which you can control which algorithm the optimizer uses:

01SQL> select *
02  2    from v$sql_hint
03  3   where name like '%CONNECT_BY_FILTERING%'
04  4  /
05 
06NAME                    SQL_FEATURE  CLASS
07----------------------- ------------ -----------------------
08INVERSE                 TARGET_LEVEL   PROPERTY VERSION    VERSION_OUTLINE
09----------------------- ------------ ---------- ---------- ---------------
10CONNECT_BY_FILTERING    QKSFM_ALL    CONNECT_BY_FILTERING
11NO_CONNECT_BY_FILTERING            2         16 10.2.0.2   10.2.0.2
12 
13NO_CONNECT_BY_FILTERING QKSFM_ALL    CONNECT_BY_FILTERING
14CONNECT_BY_FILTERING               2         16 10.2.0.2   10.2.0.2
15 
16 
172 rows selected.

And this was surprising to me. As the version column suggests, the no_connect_by_filtering hint and the accompanying new algorithm were already introduced in version 10.2.0.2! I checked with my old 10.2.0.4 database and it is indeed present and can be used there:

01SQL> select version
02  2    from v$instance
03  3  /
04 
05VERSION
06---------------------------------------------------
0710.2.0.4.0
08 
091 row selected.
10 
11SQL>  select /*+ no_connect_by_filtering gather_plan_statistics */
12  2          lpad(' ', 2 * level - 2, ' ') || ename as ename
13  3        , level
14  4        , job
15  5        , deptno
16  6     from emp
17  connect by mgr = prior empno
18  8    start with mgr is null
19  9  /
20 
21ENAME                     LEVEL JOB                             DEPTNO
22-------------------- ---------- --------------------------- ----------
23KING                          1 PRESIDENT                           10
24  JONES                       2 MANAGER                             20
25    SCOTT                     3 ANALYST                             20
26      ADAMS                   4 CLERK                               20
27    FORD                      3 ANALYST                             20
28      SMITH                   4 CLERK                               20
29  BLAKE                       2 MANAGER                             30
30    ALLEN                     3 SALESMAN                            30
31    WARD                      3 SALESMAN                            30
32    MARTIN                    3 SALESMAN                            30
33    TURNER                    3 SALESMAN                            30
34    JAMES                     3 CLERK                               30
35  CLARK                       2 MANAGER                             10
36    MILLER                    3 CLERK                               10
37 
3814 rows selected.
39 
40SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
41  2  /
42 
43PLAN_TABLE_OUTPUT
44---------------------------------------------------------------------------------------------------------------------------------------
45SQL_ID  39kr5s8dxz7j0, child number 0
46-------------------------------------
47 select /*+ no_connect_by_filtering gather_plan_statistics */         lpad(' ', 2 * level - 2, '
48') || ename as ename       , level       , job       , deptno    from emp connect by mgr = prior
49empno   start with mgr is null
50 
51Plan hash value: 763482334
52 
53----------------------------------------------------------------------------------------------------------
54| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
55----------------------------------------------------------------------------------------------------------
56|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |     14 |00:00:00.01 |       3 |
57|   2 |   TABLE ACCESS FULL                     | EMP  |      1 |     14 |     14 |00:00:00.01 |       3 |
58----------------------------------------------------------------------------------------------------------
59 
60Predicate Information (identified by operation id):
61---------------------------------------------------
62 
63   1 - access("MGR"=PRIOR NULL)
64       filter("MGR" IS NULL)
65 
66 
6721 rows selected.

But you need the no_connect_by_filtering hint in version 10.2.0.4 for this query. If you do not provide the hint, this is the result:

01SQL>  select /*+ gather_plan_statistics */
02  2          lpad(' ', 2 * level - 2, ' ') || ename as ename
03  3        , level
04  4        , job
05  5        , deptno
06  6     from emp
07  connect by mgr = prior empno
08  8    start with mgr is null
09  9  /
10 
11ENAME                     LEVEL JOB                             DEPTNO
12-------------------- ---------- --------------------------- ----------
13KING                          1 PRESIDENT                           10
14  JONES                       2 MANAGER                             20
15    SCOTT                     3 ANALYST                             20
16      ADAMS                   4 CLERK                               20
17    FORD                      3 ANALYST                             20
18      SMITH                   4 CLERK                               20
19  BLAKE                       2 MANAGER                             30
20    ALLEN                     3 SALESMAN                            30
21    WARD                      3 SALESMAN                            30
22    MARTIN                    3 SALESMAN                            30
23    TURNER                    3 SALESMAN                            30
24    JAMES                     3 CLERK                               30
25  CLARK                       2 MANAGER                             10
26    MILLER                    3 CLERK                               10
27 
2814 rows selected.
29 
30SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
31  2  /
32 
33PLAN_TABLE_OUTPUT
34---------------------------------------------------------------------------------------------------------------------------------------
35SQL_ID  6zhtnf720u0bm, child number 0
36-------------------------------------
37 select /*+ gather_plan_statistics */         lpad(' ', 2 * level - 2, ' ') || ename as ename       , level
38   , job       , deptno    from emp connect by mgr = prior empno   start with mgr is null
39 
40Plan hash value: 1869448388
41 
42-----------------------------------------------------------------------------------------------------------------------
43| Id  | Operation                 | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
44-----------------------------------------------------------------------------------------------------------------------
45|*  1 |  CONNECT BY WITH FILTERING|      |      1 |        |     14 |00:00:00.01 |      15 |  2048 |  2048 | 2048  (0)|
46|*  2 |   TABLE ACCESS FULL       | EMP  |      1 |      1 |      1 |00:00:00.01 |       3 |       |       |          |
47|*  3 |   HASH JOIN               |      |      4 |        |     13 |00:00:00.01 |      12 |  1452K|  1452K|  843K (0)|
48|   4 |    CONNECT BY PUMP        |      |      4 |        |     14 |00:00:00.01 |       0 |       |       |          |
49|   5 |    TABLE ACCESS FULL      | EMP  |      4 |      2 |     56 |00:00:00.01 |      12 |       |       |          |
50-----------------------------------------------------------------------------------------------------------------------
51 
52Predicate Information (identified by operation id):
53---------------------------------------------------
54 
55   1 - access("MGR"=PRIOR NULL)
56   2 - filter("MGR" IS NULL)
57   3 - access("MGR"=PRIOR NULL)
58 
59 
6024 rows selected.

Which explains why I didn't see the CONNECT BY NO FILTERING WITH START-WITH earlier. It seems that Oracle has adjusted the cost calculation of connect by queries somewhere between 10.2.0.4 and 11.2.0.1. Just look at the cost from both execution plans on 10.2.0.4 using a regular explain plan statement and a "select * from table(dbms_xplan.display):

01----------------------------------------------------------------------------------
02| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
03----------------------------------------------------------------------------------
04|   0 | SELECT STATEMENT          |      |     2 |    50 |     3   (0)| 00:00:01 |
05|*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |
06|*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    29 |     3   (0)| 00:00:01 |
07|*  3 |   HASH JOIN               |      |       |       |            |          |
08|   4 |    CONNECT BY PUMP        |      |       |       |            |          |
09|   5 |    TABLE ACCESS FULL      | EMP  |     2 |    50 |     3   (0)| 00:00:01 |
10----------------------------------------------------------------------------------

1------------------------------------------------------------------------------------------------
2| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
3------------------------------------------------------------------------------------------------
4|   0 | SELECT STATEMENT                        |      |    14 |   350 |     3   (0)| 00:00:01 |
5|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |
6|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     3   (0)| 00:00:01 |
7------------------------------------------------------------------------------------------------

The cost of 3 is due to the full table scan of EMP, and no additional cost is added for the hierarchical query.

These are the plans from 11.2.0.2:

01----------------------------------------------------------------------------------
02| Id  | Operation                 | Name | Rows  | Bytes | Cost (%CPU)| Time     |
03----------------------------------------------------------------------------------
04|   0 | SELECT STATEMENT          |      |     3 |   156 |    15  (20)| 00:00:01 |
05|*  1 |  CONNECT BY WITH FILTERING|      |       |       |            |          |
06|*  2 |   TABLE ACCESS FULL       | EMP  |     1 |    25 |     4   (0)| 00:00:01 |
07|*  3 |   HASH JOIN               |      |     2 |    76 |     9  (12)| 00:00:01 |
08|   4 |    CONNECT BY PUMP        |      |       |       |            |          |
09|*  5 |    TABLE ACCESS FULL      | EMP  |    13 |   325 |     4   (0)| 00:00:01 |
10----------------------------------------------------------------------------------

1------------------------------------------------------------------------------------------------
2| Id  | Operation                               | Name | Rows  | Bytes | Cost (%CPU)| Time     |
3------------------------------------------------------------------------------------------------
4|   0 | SELECT STATEMENT                        |      |    14 |   728 |     5  (20)| 00:00:01 |
5|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |       |       |            |          |
6|   2 |   TABLE ACCESS FULL                     | EMP  |    14 |   350 |     4   (0)| 00:00:01 |
7------------------------------------------------------------------------------------------------


The numbers from the 11.2.0.2 show more sophistication than just the cost of the table scan. The optimizer can't know how many levels deep the data is, but version 10.2.0.4 apparently picked 1, and left the total cost unchanged from 3 to 3. I'm curious to know in which version in between 10.2.0.4 and 11.2.0.2 this cost calculation changed. If anyone who is reading this, has a version in between and likes to check, please let me know in the comments. My guess would be that 11.2.0.1 contained the cost change.

What does CONNECT BY NO FILTERING WITH START-WITH do?

Let's explore this, using this table:

01SQL> create table t (id, parent_id, value, indicator)
02  as
03  3   select level - 1
04  4        , case level when 1 then null else trunc((level-1)/10) end
05  5        , round(dbms_random.value * 1000)
06  6        , case mod(level,10) when 4 then 'N' else 'Y' end
07  7     from dual
08  connect by level <= 100000
09  9  /
10 
11Table created.
12 
13SQL> alter table t
14  2    add constraint cbt_pk
15  3    primary key (id)
16  4  /
17 
18Table altered.
19 
20SQL> create index i1 on t (parent_id,indicator)
21  2  /
22 
23Index created.
24 
25SQL> exec dbms_stats.gather_table_stats(user,'t',cascade=>true)

The data is tree shaped where each parent node has exactly 9 child nodes. One tenth of the data, with an id that ends with the digit 3, has its indicator column set to 'N'. This select query will make it clearer how the data looks like:

01SQL> select *
02  2    from t
03  3   where id < 24 or id > 99997
04  4   order by id
05  5  /
06 
07        ID  PARENT_ID      VALUE I
08---------- ---------- ---------- -
09         0                   656 Y
10         1          0        289 Y
11         2          0        365 Y
12         3          0        644 N
13         4          0        364 Y
14         5          0        841 Y
15         6          0        275 Y
16         7          0        529 Y
17         8          0        500 Y
18         9          0        422 Y
19        10          1        598 Y
20        11          1        104 Y
21        12          1        467 Y
22        13          1        296 N
23        14          1        105 Y
24        15          1        220 Y
25        16          1        692 Y
26        17          1        793 Y
27        18          1         29 Y
28        19          1        304 Y
29        20          2        467 Y
30        21          2        716 Y
31        22          2        837 Y
32        23          2        432 N
33     99998       9999        609 Y
34     99999       9999         24 Y
35 
3626 rows selected.

When hearing the word "filter", I almost immediately associate it with a WHERE clause. But a where clause in a connect by query, is not what is meant by connect by filtering. The documentation states:

Oracle processes hierarchical queries as follows:

A join, if present, is evaluated first, whether the join is specified in the FROM clause or with WHERE clause predicates.

The CONNECT BY condition is evaluated.

Any remaining WHERE clause predicates are evaluated.

So a where clause predicate is evaluated AFTER the connect by has done its job. You can see that happening here:

SQL> explain plan
  2  for
  3   select id
  4        , parent_id
  5        , sys_connect_by_path(id,'->') scbp
  6     from t
  7    where indicator = 'N'
  8  connect by parent_id = prior id
  9    start with parent_id is null
 10  /

Explained.

SQL> select * from table(dbms_xplan.display)
  2  /

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------------------------------------------
Plan hash value: 2502271019

---------------------------------------------------------------------------------------
| Id  | Operation                      | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT               |      |    11 |   319 |   164   (3)| 00:00:02 |
|*  1 |  FILTER                        |      |       |       |            |          |
|*  2 |   CONNECT BY WITH FILTERING    |      |       |       |            |          |
|*  3 |    TABLE ACCESS FULL           | T    |     1 |    11 |    80   (2)| 00:00:01 |
|   4 |    NESTED LOOPS                |      |    10 |   240 |    82   (2)| 00:00:01 |
|   5 |     CONNECT BY PUMP            |      |       |       |            |          |
|   6 |     TABLE ACCESS BY INDEX ROWID| T    |    10 |   110 |     2   (0)| 00:00:01 |
|*  7 |      INDEX RANGE SCAN          | I1   |    10 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - filter("INDICATOR"='N')
   2 - access("PARENT_ID"=PRIOR "ID")
   3 - filter("PARENT_ID" IS NULL)
   7 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id   ")

22 rows selected.

The "indicator = 'N'" predicate is at step 1, which is executed after the CONNECT BY WITH FILTERING at step 2. Note that although this query is executed in 11.2.0.2, the optimizer has chosen the old CONNECT BY WITH FILTERING.

Connect by filtering is done by using filters in your CONNECT BY clause. Here is an example using the predicate "indicator = 'N'" inside the CONNECT BY clause:

01SQL>  select id
02  2        , parent_id
03  3        , sys_connect_by_path(id,'->') scbp
04  4     from t
05  connect by parent_id = prior id
06  6      and indicator = 'N'
07  7    start with parent_id is null
08  8  /
09 
10        ID  PARENT_ID SCBP
11---------- ---------- --------------------------------------------------
12         0            ->0
13         3          0 ->0->3
14        33          3 ->0->3->33
15       333         33 ->0->3->33->333
16      3333        333 ->0->3->33->333->3333
17     33333       3333 ->0->3->33->333->3333->33333
18 
196 rows selected.
20 
21SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
22  2  /
23 
24PLAN_TABLE_OUTPUT
25---------------------------------------------------------------------------------------------------------------------------------------
26SQL_ID  dzkjzrrzgnvd5, child number 0
27-------------------------------------
28 select id       , parent_id       , sys_connect_by_path(id,'->') scbp
29  from t connect by parent_id = prior id     and indicator = 'N'
30start with parent_id is null
31 
32Plan hash value: 3164577763
33 
34---------------------------------------------------------------------------------------------------------------------------
35| Id  | Operation                     | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
36---------------------------------------------------------------------------------------------------------------------------
37|   0 | SELECT STATEMENT              |      |      1 |        |      6 |00:00:00.01 |     294 |       |       |          |
38|*  1 |  CONNECT BY WITH FILTERING    |      |      1 |        |      6 |00:00:00.01 |     294 |  2048 |  2048 | 2048  (0)|
39|*  2 |   TABLE ACCESS FULL           | T    |      1 |      1 |      1 |00:00:00.01 |     277 |       |       |          |
40|   3 |   NESTED LOOPS                |      |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |
41|   4 |    CONNECT BY PUMP            |      |      6 |        |      6 |00:00:00.01 |       0 |       |       |          |
42|   5 |    TABLE ACCESS BY INDEX ROWID| T    |      6 |      5 |      5 |00:00:00.01 |      17 |       |       |          |
43|*  6 |     INDEX RANGE SCAN          | I1   |      6 |      5 |      5 |00:00:00.01 |      12 |       |       |          |
44---------------------------------------------------------------------------------------------------------------------------
45 
46Predicate Information (identified by operation id):
47---------------------------------------------------
48 
49   1 - access("PARENT_ID"=PRIOR NULL)
50   2 - filter("PARENT_ID" IS NULL)
51   6 - access("PARENT_ID"="connect$_by$_pump$_002"."prior id     " AND "INDICATOR"='N')
52 
53 
5427 rows selected.

In the A-rows column, you can see that the connect by filtering was effective here. Only the necessary rows were being read. And this is the key difference between the two connect by algorithms: with CONNECT BY WITH FILTERING, you can filter within each recursion, whereas CONNECT BY NO FILTERING WITH START-WITH has to read everything, does an in-memory operation, and return the result. With this example, the latter is much less efficient:

01SQL>  select /*+ no_connect_by_filtering */ id
02  2        , parent_id
03  3        , sys_connect_by_path(id,'->') scbp
04  4     from t
05  connect by parent_id = prior id
06  6      and indicator = 'N'
07  7    start with parent_id is null
08  8  /
09 
10        ID  PARENT_ID SCBP
11---------- ---------- --------------------------------------------------
12         0            ->0
13         3          0 ->0->3
14        33          3 ->0->3->33
15       333         33 ->0->3->33->333
16      3333        333 ->0->3->33->333->3333
17     33333       3333 ->0->3->33->333->3333->33333
18 
196 rows selected.
20 
21SQL> select * from table(dbms_xplan.display_cursor(null,null,'allstats last'))
22  2  /
23 
24PLAN_TABLE_OUTPUT
25---------------------------------------------------------------------------------------------------------------------------------------
26SQL_ID  3fcr31tp83by9, child number 0
27-------------------------------------
28 select /*+ no_connect_by_filtering */ id       , parent_id       ,
29sys_connect_by_path(id,'->') scbp    from t connect by parent_id =
30prior id     and indicator = 'N'   start with parent_id is null
31 
32Plan hash value: 2303479083
33 
34----------------------------------------------------------------------------------------------------------
35| Id  | Operation                               | Name | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
36----------------------------------------------------------------------------------------------------------
37|   0 | SELECT STATEMENT                        |      |      1 |        |      6 |00:00:00.14 |     277 |
38|*  1 |  CONNECT BY NO FILTERING WITH START-WITH|      |      1 |        |      6 |00:00:00.14 |     277 |
39|   2 |   TABLE ACCESS FULL                     | T    |      1 |    100K|    100K|00:00:00.01 |     277 |
40----------------------------------------------------------------------------------------------------------
41 
42Predicate Information (identified by operation id):
43---------------------------------------------------
44 
45   1 - access("PARENT_ID"=PRIOR NULL)
46       filter("PARENT_ID" IS NULL)
47 
48 
4922 rows selected.

100K rows were being read, and the A-time was 0.14 seconds instead of 0.01 seconds. I wondered where those 0.14 seconds went to, since the plan shows it's NOT for the full table scan. Using Tom Kyte's runstats_pkg reveals this:

01SQL> declare
02  2    cursor c1
03  3    is
04  4     select /*+ connect_by_filtering */ id
05  5          , parent_id
06  6          , sys_connect_by_path(id,'->') scbp
07  7       from t
08  8    connect by parent_id = prior id
09  9        and indicator = 'N'
10 10      start with parent_id is null
11 11    ;
12 12    cursor c2
13 13    is
14 14     select /*+ no_connect_by_filtering */ id
15 15          , parent_id
16 16          , sys_connect_by_path(id,'->') scbp
17 17       from t
18 18    connect by parent_id = prior id
19 19        and indicator = 'N'
20 20      start with parent_id is null
21 21    ;
22 22  begin
23 23    runstats_pkg.rs_start;
24 24    for r in c1 loop null; end loop;
25 25    runstats_pkg.rs_middle;
26 26    for r in c2 loop null; end loop;
27 27    runstats_pkg.rs_stop;
28 28  end;
29 29  /
30Run1 ran in 0 hsecs
31Run2 ran in 10 hsecs
32run 1 ran in 0% of the time
33  
34Name                                  Run1        Run2        Diff
35STAT...HSC Heap Segment Block           16          15          -1
36STAT...db block changes                 48          47          -1
37STAT...consistent gets - exami           9           8          -1
38STAT...db block gets from cach          32          33           1
39STAT...db block gets                    32          33           1
40STAT...redo subscn max counts            0           1           1
41STAT...redo ordering marks               0           1           1
42STAT...redo entries                     16          15          -1
43STAT...calls to kcmgas                   0           1           1
44STAT...calls to kcmgcs                  29          28          -1
45STAT...free buffer requested             0           1           1
46STAT...Heap Segment Array Inse          16          15          -1
47STAT...consistent changes               32          31          -1
48STAT...heap block compress               9           8          -1
49STAT...parse time cpu                    1           0          -1
50STAT...buffer is pinned count            1           0          -1
51STAT...session cursor cache co           1           0          -1
52STAT...sql area evicted                  1           0          -1
53LATCH.undo global data                  11          10          -1
54LATCH.SQL memory manager worka           3           5           2
55LATCH.messages                           0           2           2
56LATCH.OS process allocation              0           2           2
57LATCH.simulator hash latch              20          23           3
58LATCH.object queue header oper           4           1          -3
59STAT...workarea executions - o          10           6          -4
60STAT...table fetch by rowid             15          10          -5
61STAT...index scans kdiixs1               6           0          -6
62LATCH.row cache objects                280         274          -6
63STAT...sorts (memory)                    8           2          -6
64STAT...CPU used by this sessio           2          11           9
65STAT...Elapsed Time                      1          11          10
66STAT...recursive cpu usage               2          12          10
67STAT...no work - consistent re         300         284         -16
68STAT...buffer is not pinned co          36          20         -16
69STAT...session logical reads           354         337         -17
70STAT...consistent gets from ca         313         296         -17
71STAT...consistent gets from ca         322         304         -18
72LATCH.shared pool                      186         168         -18
73STAT...consistent gets                 322         304         -18
74LATCH.shared pool simulator             23           4         -19
75LATCH.cache buffers chains             785         740         -45
76STAT...undo change vector size       3,500       3,420         -80
77STAT...redo size                     4,652       4,560         -92
78STAT...session uga memory                0     -65,488     -65,488
79STAT...session pga memory                0     -65,536     -65,536
80STAT...sorts (rows)                     12     100,001      99,989
81  
82Run1 latches total versus runs -- difference and pct
83Run1        Run2        Diff       Pct
841,467       1,384         -83    106.00%
85 
86PL/SQL procedure successfully completed

The major difference is the number of rows sorted! The CONNECT BY NO FILTERING WITH START-WITH sorts all 100K rows. This is a surprise, because normally when you sort, you use memory from the PGA workarea, which shows up in your memory statistics from your execution plan. But the no filtering plan did not show those statistics (OMem, 1Mem, Used-Mem). I have no explanation for this phenomenon yet.

Let's zoom in on the sorting:

01SQL> select sn.name
02  2       , ms.value
03  3    from v$mystat ms
04  4       , v$statname sn
05  5   where ms.statistic# = sn.statistic#
06  6     and sn.name like '%sort%'
07  7  /
08 
09NAME                         VALUE
10----------------------- ----------
11sorts (memory)                2278
12sorts (disk)                     0
13sorts (rows)               9425510
14 
153 rows selected.
16 
17SQL>  select id
18  2        , parent_id
19  3        , sys_connect_by_path(id,'->') scbp
20  4     from t
21  connect by parent_id = prior id
22  6      and indicator = 'N'
23  7    start with parent_id is null
24  8  /
25 
26        ID  PARENT_ID SCBP
27---------- ---------- --------------------------------------------------
28         0            ->0
29         3          0 ->0->3
30        33          3 ->0->3->33
31       333         33 ->0->3->33->333
32      3333        333 ->0->3->33->333->3333
33     33333       3333 ->0->3->33->333->3333->33333
34 
356 rows selected.
36 
37SQL> select sn.name
38  2       , ms.value
39  3    from v$mystat ms
40  4       , v$statname sn
41  5   where ms.statistic# = sn.statistic#
42  6     and sn.name like '%sort%'
43  7  /
44 
45NAME                         VALUE
46----------------------- ----------
47sorts (memory)                2286
48sorts (disk)                     0
49sorts (rows)               9425522
50 
513 rows selected.
52 
53SQL>  select /*+ no_connect_by_filtering */ id
54  2        , parent_id
55  3        , sys_connect_by_path(id,'->') scbp
56  4     from t
57  connect by parent_id = prior id
58  6      and indicator = 'N'
59  7    start with parent_id is null
60  8  /
61 
62        ID  PARENT_ID SCBP
63---------- ---------- --------------------------------------------------
64         0            ->0
65         3          0 ->0->3
66        33          3 ->0->3->33
67       333         33 ->0->3->33->333
68      3333        333 ->0->3->33->333->3333
69     33333       3333 ->0->3->33->333->3333->33333
70 
716 rows selected.
72 
73SQL> select sn.name
74  2       , ms.value
75  3    from v$mystat ms
76  4       , v$statname sn
77  5   where ms.statistic# = sn.statistic#
78  6     and sn.name like '%sort%'
79  7  /
80 
81NAME                         VALUE
82----------------------- ----------
83sorts (memory)                2288
84sorts (disk)                     0
85sorts (rows)               9525523
86 
873 rows selected.

So CONNECT BY WITH FILTERING did 8 sorts (2286 - 2278) and sorted 12 rows (9425522 - 9425510), whereas CONNECT BY NO FILTERING WITH START-WITH did 2 (2288 - 2286) sorts and sorted 100,001 rows (9525523 - 9425522).

And finally, I promised to explain why the first two queries of this blogpost are identical, but show a different execution plan. The reason is simple: the first one is executed on 10.2.0.4 and the second one on 11.2.0.2.

 hi go through below link for more interview questions on sql and plsql....
http://swaretesting.blogspot.in/2010/08/sql-and-plsql-interview-questions-iii.html