sql - Remove duplicate rows when multiple fields collectively provide unique ID -

i have table historical census database in 4 fields collectively identify each individual. 4 fields represent different levels of governmental areas, making table this:

kommnr  kretsnr bostnr  persnr  fornvn      ettnvn [mncpl] [area]  [rsdnc] [prsn]  [firstn]    [lastn] ← english 0101    001     0001    001     john        doe 0101    001     0001    002     richard     doe 0101    001     0001    003     johnny      doe 0101    001     0002    001     jane        doe

as can seen, individuals identified combining 4 numbers (short text format) municipality, area, residence , person. personal id numbers have been added other databases, pending implementation in one.

the database working holds 900k posts, , have ended ~12k duplicates, need remove. example table [t3 3 clean]:

kommnr  kretsnr bostnr  persnr  fornvn          ettnvn [mncpl] [area]  [rsdnc] [prsn]  [firstn]        [lastn] ← english 0101    001a    0003    5       ole christian   elingsen 0101    001a    0003    5       ole christian   elingsen 1101    001a    0003    6       kristian        johannesen 1101    001a    0003    6       kristian        johannesen 2101    001a    0004    14      jens tøger      jensen 2101    001a    0004    14      jens tøger      jensen

i have read , reviewed answer provided here, couldn’t head around how apply structure working with, 2 reasons: our database combines multiple fields create quasi unique id each individual; , being there no single incremental number, believe not apply method provided in excellent answer.

in summary

what want accomplish remove every duplicate. working locally in ms access 2013.

update: data publicly available, hosted on our oracle server, important doing locally reproducible others later. not, however, believe matter solution works directly on oracle, wanting reproduce findings download data , work locally well.

note

i don’t know start writing code select 1 of each duplicate, thereafter deleting them, cannot provide sample. have tried running select distinct *, end access not responding; apparently there’s data handle such query, when running on reduced sample set of 12k.

here how selected duplicates:

select kommnr, kretsnr, bostnr, persnr, nyhush, fornvn, ettnvn, bostat, sedvbo, antopp, bygning, kjonn, famst, sivst, yrke, faar, fsted, fsted_kode, statsb, trossmf, sykdom, sykvar, bostnvn, fornvns, ettnvns, pid  [t3 3 faar rensket]  (     (([t3 3 faar rensket].kommnr) in (         select kommnr [t3 3 faar rensket] tmp       group kommnr, kretsnr, bostnr, persnr having count(*)>1            , kretsnr = [t3 3 faar rensket].kretsnr            , bostnr  = [t3 3 faar rensket].bostnr            , persnr  = [t3 3 faar rensket].persnr         )     ) )  order kommnr, kretsnr, bostnr, persnr;

this query:

select r.* [t3 3 faar rensket] r r.kommnr in (select kommnr                    [t3 3 faar rensket] r2                    group kommnr, kretsnr, bostnr, persnr                    having count(*) > 1 ,                           r2.kretsnr = r.kretsnr ,                           r2.bostnr  = r.bostnr ,                           r2.persnr  = r.persnr                  ) order kommnr, kretsnr, bostnr, persnr;

first advice: moving correlation clauses where clause:

select r.* [t3 3 faar rensket] r r.kommnr in (select r2.kommnr                    [t3 3 faar rensket] r2                    r2.kretsnr = r.kretsnr ,                          r2.bostnr  = r.bostnr ,                          r2.persnr  = r.persnr                    group kommnr, kretsnr, bostnr, persnr                    having count(*) > 1                  ) order kommnr, kretsnr, bostnr, persnr;

second, add index on [t3 3 faar rensket](kretsnr, bostnr, persnr, kommnr).

see if these performance.

Search This Blog

Alcombright

sql - Remove duplicate rows when multiple fields collectively provide unique ID -

in summary

note

Comments

Post a Comment

Popular posts from this blog

c# SetCompatibleTextRenderingDefault must be called before the first -

c++ - Fill runtime data at compile time with templates -

C#.NET Oracle.ManagedDataAccess ConfigSchema.xsd -