MySQL 테이블에서 중복을 삭제하는 방법은 무엇입니까?
테이블 DELETE
에서 지정된 sid에 대한 행 을 복제 해야 MySQL
합니다.
SQL 쿼리로 어떻게 할 수 있습니까?
DELETE (DUPLICATED TITLES) FROM table WHERE SID = "1"
이와 같은 것이지만 어떻게 해야할지 모르겠습니다.
이것은 새로운 테이블을 만들지 않고 제자리에서 중복을 제거합니다.
ALTER IGNORE TABLE `table_name` ADD UNIQUE (title, SID)
참고 : 인덱스가 메모리에 맞는 경우에만 잘 작동합니다
employee
다음 열 이있는 table이 있다고 가정합니다 .
employee (first_name, last_name, start_date)
중복 first_name
열이 있는 행을 삭제하려면 다음을 수행하십시오 .
delete
from employee using employee,
employee e1
where employee.id > e1.id
and employee.first_name = e1.first_name
다음은 단일 SID뿐만 아니라 모든 SID에 대해 중복을 제거합니다.
임시 테이블
CREATE TABLE table_temp AS
SELECT * FROM table GROUP BY title, SID;
DROP TABLE table;
RENAME TABLE table_temp TO table;
이후 temp_table
새로 생성는 더 인덱스가 없습니다. 중복을 제거한 후 다시 만들어야합니다. 테이블에 어떤 인덱스가 있는지 확인할 수 있습니다.SHOW INDEXES IN table
임시 테이블이없는 경우 :
DELETE FROM `table` WHERE id IN (
SELECT all_duplicates.id FROM (
SELECT id FROM `table` WHERE (`title`, `SID`) IN (
SELECT `title`, `SID` FROM `table` GROUP BY `title`, `SID` having count(*) > 1
)
) AS all_duplicates
LEFT JOIN (
SELECT id FROM `table` GROUP BY `title`, `SID` having count(*) > 1
) AS grouped_duplicates
ON all_duplicates.id = grouped_duplicates.id
WHERE grouped_duplicates.id IS NULL
)
MySQL에서 중복 행 삭제, 연습
테이블을 작성하고 일부 행을 삽입하십시오.
dev-db> create table penguins(foo int, bar varchar(15), baz datetime);
Query OK, 0 rows affected (0.07 sec)
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(1, 'skipper', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(3, 'kowalski', now());
dev-db> insert into penguins values(4, 'rico', now());
Query OK, 6 rows affected (0.07 sec)
dev-db> select * from penguins;
+------+----------+---------------------+
| foo | bar | baz |
+------+----------+---------------------+
| 1 | skipper | 2014-08-25 14:21:54 |
| 1 | skipper | 2014-08-25 14:21:59 |
| 3 | kowalski | 2014-08-25 14:22:09 |
| 3 | kowalski | 2014-08-25 14:22:13 |
| 3 | kowalski | 2014-08-25 14:22:15 |
| 4 | rico | 2014-08-25 14:22:22 |
+------+----------+---------------------+
6 rows in set (0.00 sec)
그런 다음 중복을 제거하십시오.
dev-db> delete a
-> from penguins a
-> left join(
-> select max(baz) maxtimestamp, foo, bar
-> from penguins
-> group by foo, bar) b
-> on a.baz = maxtimestamp and
-> a.foo = b.foo and
-> a.bar = b.bar
-> where b.maxtimestamp IS NULL;
Query OK, 3 rows affected (0.01 sec)
결과:
dev-db> select * from penguins;
+------+----------+---------------------+
| foo | bar | baz |
+------+----------+---------------------+
| 1 | skipper | 2014-08-25 14:21:59 |
| 3 | kowalski | 2014-08-25 14:22:15 |
| 4 | rico | 2014-08-25 14:22:22 |
+------+----------+---------------------+
3 rows in set (0.00 sec)
그 delete 문은 무엇을하고 있습니까?
의사 코드 : 중복을 제거하려는 두 열로 행을 그룹화합니다. 최대 집계를 사용하여 유지할 각 그룹의 한 행을 선택하십시오. 왼쪽 조인은 왼쪽 테이블의 모든 행과 오른쪽 테이블의 일치하는 행을 반환합니다. 이 경우 왼쪽 테이블에는 테이블의 모든 행이 있고 오른쪽 테이블에는 NULL 인 행만 있습니다 (그룹당 한 행은 유지하지 않음). 해당 행을 삭제하면 그룹당 고유 한 행만 남게됩니다.
더 많은 기술적 인 설명, 어떻게 당신이 그 sql delete 문을 읽어야 하는가 :
Table penguins with alias 'a' is left joined on a subset of table penguins called alias 'b'. The right hand table 'b' which is a subset finds the max timestamp grouped by foo and bar. This is matched to left hand table 'a'. (foo,bar,baz) on left has every row in the table. The right hand subset 'b' has a (maxtimestamp,foo,bar) which is matched to left only on the one that IS the max.
Every row that is not that max has value maxtimestamp of NULL. Filter down on those NULL rows and you have a set of all rows grouped by foo and bar that isn't the latest timestamp baz. Delete those ones.
Make a backup of the table before you run this.
Prevent this problem from ever happening again on this table:
이 작업을 수행하면 "중복 행"화재가 발생합니다. 큰. 당신의 일은 아직 끝나지 않았습니다. 테이블에 새로운 복합 고유 키를 정의하여 (두 열에) 더 많은 중복이 처음에 추가되지 않도록하십시오. 좋은 면역 체계와 마찬가지로 삽입시 잘못된 행을 테이블에 넣지 않아야합니다. 나중에 중복 프로그램을 추가하는 모든 프로그램에서 항의를 방송하고 문제를 해결해 도이 문제는 다시 발생하지 않습니다.
거대한 데이터베이스 에서이 문제를 직접 겪은 후에 다른 답변의 성능에 완전히 감명받지 못했습니다. 최신 중복 행만 유지하고 나머지는 삭제하고 싶습니다.
임시 테이블이없는 단일 쿼리 문에서 이것은 가장 효과적이었습니다.
DELETE e.*
FROM employee e
WHERE id IN
(SELECT id
FROM (SELECT MIN(id) as id
FROM employee e2
GROUP BY first_name, last_name
HAVING COUNT(*) > 1) x);
유일한주의 사항은 쿼리를 여러 번 실행해야한다는 것입니다. 그러나 그럼에도 불구하고 다른 옵션보다 나에게 더 효과적이라는 것을 알았습니다.
이것은 항상 나를 위해 작동하는 것 같습니다 :
CREATE TABLE NoDupeTable LIKE DupeTable;
INSERT NoDupeTable SELECT * FROM DupeTable group by CommonField1,CommonFieldN;
각 속임수 및 나머지 비 속성 레코드에서 가장 낮은 ID를 유지합니다.
또한 제거 후 더 이상 속임수 문제가 발생하지 않도록 다음을 수행했습니다.
CREATE TABLE NoDupeTable LIKE DupeTable;
Alter table NoDupeTable Add Unique `Unique` (CommonField1,CommonField2);
INSERT IGNORE NoDupeTable SELECT * FROM DupeTable;
즉, 첫 번째 테이블의 복제본을 만들고 중복을 원하지 않는 필드에 고유 인덱스를 추가 한 다음 처음 추가 할 때 Insert IGNORE
와 같이 정상적으로 실패하지 않는 이점을 얻 Insert
습니다. 두 필드를 기반으로하는 중복 레코드이며 해당 레코드를 무시합니다.
암호를 이동하면이 두 필드를 기반으로 중복 레코드를 작성하는 것이 불가능 해집니다.
간단한 답변은 다음과 같습니다.
delete a from target_table a left JOIN (select max(id_field) as id, field_being_repeated
from target_table GROUP BY field_being_repeated) b
on a.field_being_repeated = b.field_being_repeated
and a.id_field = b.id_field
where b.id_field is null;
다음은 모든 테이블에 적용됩니다.
CREATE TABLE `noDup` LIKE `Dup` ;
INSERT `noDup` SELECT DISTINCT * FROM `Dup` ;
DROP TABLE `Dup` ;
ALTER TABLE `noDup` RENAME `Dup` ;
This work for me to remove old records:
delete from table where id in
(select min(e.id)
from (select * from table) e
group by column1, column2
having count(*) > 1
);
You can replace min(e.id) to max(e.id) to remove newest records.
This procedure will remove all duplicates (incl multiples) in a table, keeping the last duplicate. This is an extension of Retrieving last record in each group
Hope this is useful to someone.
DROP TABLE IF EXISTS UniqueIDs;
CREATE Temporary table UniqueIDs (id Int(11));
INSERT INTO UniqueIDs
(SELECT T1.ID FROM Table T1 LEFT JOIN Table T2 ON
(T1.Field1 = T2.Field1 AND T1.Field2 = T2.Field2 #Comparison Fields
AND T1.ID < T2.ID)
WHERE T2.ID IS NULL);
DELETE FROM Table WHERE id NOT IN (SELECT ID FROM UniqueIDs);
delete p from
product p
inner join (
select max(id) as id, url from product
group by url
having count(*) > 1
) unik on unik.url = p.url and unik.id != p.id;
Another easy way... using UPDATE IGNORE:
U have to use an index on one or more columns (type index). Create a new temporary reference column (not part of the index). In this column, you mark the uniques in by updating it with ignore clause. Step by step:
Add a temporary reference column to mark the uniques:
ALTER TABLE `yourtable` ADD `unique` VARCHAR(3) NOT NULL AFTER `lastcolname`;
=> this will add a column to your table.
Update the table, try to mark everything as unique, but ignore possible errors due to to duplicate key issue (records will be skipped):
UPDATE IGNORE `yourtable` SET `unique` = 'Yes' WHERE 1;
=> you will find your duplicate records will not be marked as unique = 'Yes', in other words only one of each set of duplicate records will be marked as unique.
Delete everything that's not unique:
DELETE * FROM `yourtable` WHERE `unique` <> 'Yes';
=> This will remove all duplicate records.
Drop the column...
ALTER TABLE `yourtable` DROP `unique`;
I find Werner's solution above to be the most convenient because it works regardless of the presence of a primary key, doesn't mess with tables, uses future-proof plain sql, is very understandable.
As I stated in my comment, that solution hasn't been properly explained though. So this is mine, based on it.
1) add a new boolean column
alter table mytable add tokeep boolean;
2) add a constraint on the duplicated columns AND the new column
alter table mytable add constraint preventdupe unique (mycol1, mycol2, tokeep);
3) set the boolean column to true. This will succeed only on one of the duplicated rows because of the new constraint
update ignore mytable set tokeep = true;
4) delete rows that have not been marked as tokeep
delete from mytable where tokeep is null;
5) drop the added column
alter table mytable drop tokeep;
I suggest that you keep the constraint you added, so that new duplicates are prevented in the future.
Deleting duplicates on MySQL tables is a common issue, that usually comes with specific needs. In case anyone is interested, here (Remove duplicate rows in MySQL) I explain how to use a temporary table to delete MySQL duplicates in a reliable and fast way, also valid to handle big data sources (with examples for different use cases).
Ali, in your case, you can run something like this:
-- create a new temporary table
CREATE TABLE tmp_table1 LIKE table1;
-- add a unique constraint
ALTER TABLE tmp_table1 ADD UNIQUE(sid, title);
-- scan over the table to insert entries
INSERT IGNORE INTO tmp_table1 SELECT * FROM table1 ORDER BY sid;
-- rename tables
RENAME TABLE table1 TO backup_table1, tmp_table1 TO table1;
DELETE T2
FROM table_name T1
JOIN same_table_name T2 ON (T1.title = T2.title AND T1.ID <> T2.ID)
delete from `table` where `table`.`SID` in
(
select t.SID from table t join table t1 on t.title = t1.title where t.SID > t1.SID
)
Love @eric's answer but it doesn't seem to work if you have a really big table (I'm getting The SELECT would examine more than MAX_JOIN_SIZE rows; check your WHERE and use SET SQL_BIG_SELECTS=1 or SET MAX_JOIN_SIZE=# if the SELECT is okay
when I try to run it). So I limited the join query to only consider the duplicate rows and I ended up with:
DELETE a FROM penguins a
LEFT JOIN (SELECT COUNT(baz) AS num, MIN(baz) AS keepBaz, foo
FROM penguins
GROUP BY deviceId HAVING num > 1) b
ON a.baz != b.keepBaz
AND a.foo = b.foo
WHERE b.foo IS NOT NULL
The WHERE clause in this case allows MySQL to ignore any row that doesn't have a duplicate and will also ignore if this is the first instance of the duplicate so only subsequent duplicates will be ignored. Change MIN(baz)
to MAX(baz)
to keep the last instance instead of the first.
This works for large tables:
CREATE Temporary table duplicates AS select max(id) as id, url from links group by url having count(*) > 1;
DELETE l from links l inner join duplicates ld on ld.id = l.id WHERE ld.id IS NOT NULL;
To delete oldest change max(id)
to min(id)
This here will make the column column_name
into a primary key, and in the meantime ignore all errors. So it will delete the rows with a duplicate value for column_name
.
ALTER IGNORE TABLE `table_name` ADD PRIMARY KEY (`column_name`);
I think this will work by basically copying the table and emptying it then putting only the distinct values back into it but please double check it before doing it on large amounts of data.
Creates a carbon copy of your table
create table temp_table like oldtablename; insert temp_table select * from oldtablename;
Empties your original table
DELETE * from oldtablename;
Copies all distinct values from the copied table back to your original table
INSERT oldtablename SELECT * from temp_table group by firstname,lastname,dob
Deletes your temp table.
Drop Table temp_table
You need to group by aLL fields that you want to keep distinct.
here is how I usually eliminate duplicates
- add a temporary column, name it whatever you want(i'll refer as active)
- group by the fields that you think shouldn't be duplicate and set their active to 1, grouping by will select only one of duplicate values(will not select duplicates)for that columns
- delete the ones with active zero
- drop column active
- optionally(if fits to your purposes), add unique index for those columns to not have duplicates again
You could just use a DISTINCT clause to select the "cleaned up" list (and here is a very easy example on how to do that).
Could it work if you count them, and then add a limit to your delete query leaving just one?
For example, if you have two or more, write your query like this:
DELETE FROM table WHERE SID = 1 LIMIT 1;
There are just a few basic steps when removing duplicate data from your table:
- Back up your table!
- Find the duplicate rows
- Remove the duplicate rows
Here is the full tutorial: https://blog.teamsql.io/deleting-duplicate-data-3541485b3473
참고URL : https://stackoverflow.com/questions/2630440/how-to-delete-duplicates-on-a-mysql-table
'Programming' 카테고리의 다른 글
자바 SSL 및 인증서 키 저장소 (0) | 2020.06.09 |
---|---|
Nexus 4를 adb에 연결할 수 없음 : 승인되지 않음 (0) | 2020.06.08 |
Django ORM 쿼리 세트의 해당 SQL 쿼리를 보는 방법은 무엇입니까? (0) | 2020.06.08 |
범위에서 임의의 정수 생성 (0) | 2020.06.08 |
예 / 아니요 입력과 같은 APT 명령 행 인터페이스? (0) | 2020.06.08 |