Removing duplicate records in ORACLE SQL
Pranav Bakare
Posted on October 25, 2024
Removing duplicate records from the emp table using various methods, including creating the table, inserting sample data, backing up the table, and employing different SQL techniques. This will cover all solutions discussed previously.
Step 1: Create the Employee Table
First, create the emp table to store employee information.
CREATE TABLE emp (
emp_id NUMBER PRIMARY KEY,
first_name VARCHAR2(50),
last_name VARCHAR2(50),
email VARCHAR2(100),
department VARCHAR2(50),
salary NUMBER
);
Step 2: Insert Sample Data
Insert sample data into the emp table, which contains some duplicate records for demonstration purposes.
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (1, 'John', 'Doe', 'john.doe@example.com', 'IT', 60000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (2, 'Jane', 'Smith', 'jane.smith@example.com', 'HR', 65000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (3, 'John', 'Doe', 'john.doe@example.com', 'IT', 60000);
-- Duplicate
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (4, 'Mark', 'Johnson', 'mark.j@example.com', 'IT', 70000);
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (5, 'Jane', 'Smith', 'jane.smith@example.com', 'HR', 65000);
-- Duplicate
INSERT INTO emp (emp_id, first_name, last_name, email, department, salary)
VALUES (6, 'Emily', 'Davis', 'emily.d@example.com', 'Finance', 55000);
Current State of the Table
After inserting the records, the emp table will look like this:
Step 3: Back Up the Table
- Before removing duplicates, create a backup of the emp table to avoid any data loss.
CREATE TABLE emp_backup AS SELECT * FROM emp;
Step 4: Remove Duplicate Records Using Different Methods
Method 1: Using Unique Identifier (Email and MAX Function)
Identify and select duplicates based on the email and keep the maximum emp_id.
-- Identify duplicates
SELECT email, MAX(emp_id) AS max_emp_id, COUNT(*)
FROM emp
GROUP BY email
HAVING COUNT(*) > 1;
-- Delete duplicates
DELETE FROM emp
WHERE emp_id NOT IN (
SELECT MAX(emp_id)
FROM emp
GROUP BY email
);
Method 2: Using Self Join
Remove duplicates using a self-join on the email column.
DELETE FROM emp e1
WHERE EXISTS (
SELECT 1
FROM emp e2
WHERE e1.email = e2.email
AND e1.emp_id < e2.emp_id
);
Method 3: Using Window Functions
Utilize window functions and an inline view to identify duplicates and delete them.
DELETE FROM emp
WHERE emp_id IN (
SELECT emp_id
FROM (
SELECT emp_id,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY emp_id)
AS row_num
FROM emp
)
WHERE row_num > 1
);
Method 4: Using MIN Function
Keep the minimum emp_id for each duplicate email and delete others.
DELETE FROM emp
WHERE emp_id NOT IN (
SELECT MIN(emp_id)
FROM emp
GROUP BY email
);
Step 5: Verify the Table After Deletion
After executing the delete operations, you can verify the remaining records in the emp table to ensure that duplicates have been successfully removed.
SELECT * FROM emp;
Step 6: Confirm Backup Data
You can also check the backup table to ensure that the original data is preserved.
SELECT * FROM emp_backup;
Final State of the emp Table
After removing duplicates, the emp table should look like this:
Conclusion
This comprehensive guide demonstrates various methods for removing duplicate records from the emp table, including the creation of the table, insertion of sample data, backing up the table, and utilizing different SQL techniques. Always ensure to back up your data before performing deletion operations to avoid accidental data loss.
Posted on October 25, 2024
Join Our Newsletter. No Spam, Only the good stuff.
Sign up to receive the latest update from our blog.