Example of complex SQL query to get as much data as possible from Database.

tyzia

Andrei Navumau

Posted on December 7, 2017

Example of complex SQL query to get as much data as possible from Database.

I'd like to share an example of a complex SQL query, which will return all possible data from Database.

DB Schema looks like this:

DB Schema

The task was "to displays all employees and their related info even if some info is missing. Get as much information as you can about the employees".

My final SQL query was like this:

SELECT
  e.employee_id AS "Employee #"
  , e.first_name || ' ' || e.last_name AS "Name"
  , e.email AS "Email"
  , e.phone_number AS "Phone"
  , TO_CHAR(e.hire_date, 'MM/DD/YYYY') AS "Hire Date"
  , TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Salary"
  , e.commission_pct AS "Comission %"
  , 'works as ' || j.job_title || ' in ' || d.department_name || ' department (manager: '
    || dm.first_name || ' ' || dm.last_name || ') and immediate supervisor: ' || m.first_name || ' ' || m.last_name AS "Current Job"
  , TO_CHAR(j.min_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') || ' - ' ||
      TO_CHAR(j.max_salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''') AS "Current Salary"
  , l.street_address || ', ' || l.postal_code || ', ' || l.city || ', ' || l.state_province || ', '
    || c.country_name || ' (' || r.region_name || ')' AS "Location"
  , jh.job_id AS "History Job ID"
  , 'worked from ' || TO_CHAR(jh.start_date, 'MM/DD/YYYY') || ' to ' || TO_CHAR(jh.end_date, 'MM/DD/YYYY') ||
    ' as ' || jj.job_title || ' in ' || dd.department_name || ' department' AS "History Job Title"
  
FROM employees e
-- to get title of current job_id
  JOIN jobs j 
    ON e.job_id = j.job_id
-- to get name of current manager_id
  LEFT JOIN employees m 
    ON e.manager_id = m.employee_id
-- to get name of current department_id
  LEFT JOIN departments d 
    ON d.department_id = e.department_id
-- to get name of manager of current department
-- (not equal to current manager and can be equal to the employee itself)
  LEFT JOIN employees dm 
    ON d.manager_id = dm.employee_id
-- to get name of location
  LEFT JOIN locations l
    ON d.location_id = l.location_id
  LEFT JOIN countries c
    ON l.country_id = c.country_id
  LEFT JOIN regions r
    ON c.region_id = r.region_id
-- to get job history of employee
  LEFT JOIN job_history jh
    ON e.employee_id = jh.employee_id
-- to get title of job history job_id
  LEFT JOIN jobs jj
    ON jj.job_id = jh.job_id
-- to get namee of department from job history
  LEFT JOIN departments dd
    ON dd.department_id = jh.department_id

ORDER BY e.employee_id;

Let's go through it step by step:

Before creating our final Select statement, let's see which tables require LEFT joins and which do not. We need this check because LEFT JOINS are slower than INNER JOINS, so for efficiency we will be using (INNER) JOINS were possible.

To check that every employee has a job_id. If there are results (id's of employees), so we need LEFT JOIN, because there are employees who do not have (IS NULL == true) job_id. Otherwise (IS NULL == false) we use only JOIN:

SELECT employee_id FROM employees WHERE job_id IS NULL;

Result: empty. Conclusion: we use JOIN for 'jobs' table.

The same check as above, this time checking department_id:

SELECT employee_id FROM employees WHERE department_id IS NULL;

Result: 178. Conclusion: we use LEFT JOIN for 'departments' table.

Because there is one employee without department_id, the whole chain of future joins (departments-loations-countries-regions) should be also LEFT Joined, despite the fact that these tables can be INNER joined between each other in a separate query.

The same check as above, this time checking manager_id:

SELECT employee_id FROM employees WHERE manager_id IS NULL;

Result: 100. Conclusion: we use LEFT JOIN (self join) for 'employees' table. It means, that one employee doesn't have a manager.

And some general comments on this final query:

AS
  • we use it to define alias, column names in the result table look more human-readable.
    ||
  • we use this to concatenate strings
    'text'
  • we enclose any text into single quotes
    TO_CHAR(e.hire_date, 'MM/DD/YYYY')
  • in this way we can set format of the date
    TO_CHAR(e.salary, 'L99G999D99', 'NLS_NUMERIC_CHARACTERS = ''.,'' NLS_CURRENCY = ''$''')
  • in this way we can output currency with separators. My detailed post about it can be found here.
    JOIN jobs j ...
  • this 'j' means alias to a table name. The same table can be joined to itself (self join) and for this reason this table alias for the same table can and should be different.

The query returned 110 results, though there are only 107 employees. Because out of 107 employees, 3 have two job history entries. The first couple of rows of the result look like this:

Complex query result

Complex query result continue

We could definitely go further and get all information about departments, where according to job history entries a couple of employees were working, but I decided that it would be too much.

Originally posted on my website.

I hope this example will be useful for newbies. And to those of you, who are experienced, I'd like to address some questions:

  1. Is it really important which JOIN to apply: LEFT or INNER? I mean is there any real gain in productivity?
  2. Is it a good practice to pack all possible information into one query? Or to split it into small queries is a better approach?
  3. What mistakes were made in this query, if any?

Take care!

💖 💪 🙅 🚩
tyzia
Andrei Navumau

Posted on December 7, 2017

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related