sql server - Find (and possibly remove) duplicate entries in not straight forward table structure -

problem description

i have following tables:

business (id uniqueidentifier, other columns)
address (id uniqueidentifier, businessid uniqueidentifier, type nvarchar, validfrom datetime, validto datetime, other columns)
addresscomponent (id uniqueidentifier, addressid uniqueidentifier, value nvarchar, type nvarchar, other columns)

an address unique if there no other addresses valid attached same business same address type, , and same number of components same values , types.

in production have managed several duplicate entries on years. i'm confident it's possible find duplicate values (and thereby remove them) tsql-query, exact details of solution keep eluding me.

what i've tried far

i've found lot of great answers come close (such examples found here finding duplicate values in sql table ). i've managed myself along lines of:

select     b.id, ac.text, ac.type, count(ac.value) amount, count(b.id) numbiz       businesse b inner join      address adr on adr.businessid = b.id inner join      addresscomponent ac on ac.addressid = adr.id      adr.validto > getdate() group     ac.text, ac.type, adr.type, b.id having      count(ac.text) > 1

it comes close, struggle how make duplicated combinations of components instead of duplicated components. business may instance have several addresses same postal code don't care (but query finds). want find duplicated combinations of components attached different addresses attached same business.

sample data

business(id uniqueidentifier, other fields)

1   'asdf' 2   'qwer' 3   'zxcv'

address(id uniqueidentifier, businessid uniqueidentifier, type nvarchar, validfrom datetime, validto datetime, other fields)

1   1   'electronic'    2016-01-01  2099-12-31  'hjkl' 2   1   'electronic'    2016-01-01  2099-12-31  'qwer' 3   1   'postal'    2016-01-01  2099-12-31  'xcvb' 4   1   'visiting'  2016-01-01  2099-12-31  'qwr' 5   1   'visiting'  2016-01-01  2099-12-31  'qwr' 6   2   'electronic'    2016-01-01  2099-12-31  'zxcv' 7   3   'electronic'    2016-01-01  2099-12-31  'fghj'

addresscomponent(id uniqueidentifier, addressid uniqueidentifier, value nvarchar, type nvarchar, other fields)

1   1   'example@example.com'   'email' 'asdf' 2   2   'example@example.com'   'email' 'qwer' 3   3   'road number 1' 'street'    'sdfg' 4   3   '1234'  'postal code'   'dgfh' 5   3   'place' 'postal area'   'cbvn' 6   4   'road number 2' 'street'    'sdfg' 7   4   '1234'  'postal code'   'dgfh' 8   4   'place' 'postal area'   'cbvn' 9   6   'example@example.com'   'email' 'xcvb' 10  7   'another-email@example.tld' 'email' 'dsvv' 11  5   'road number 2' 'street'    'sdfg' 12  5   '1234'  'postal code'   'dgfh' 13  5   'place' 'postal area'   'cbvn'

sample output

number of duplicated addresses: 4

the duplicates:

businessid  duplicatedaddressids  1           1,2 1           4,5

this selection want.

select b.id         , a.id aid , a.type atype         , ac.id acid , ac.type actype         , ac.text actext  business b join address on a.businessid = b.id  join addresscomponent ac on ac.addressid = a.id a.validto > current_date , exists ( select *     address ax     join addresscomponent acx on acx.addressid = ax.id     ax.businessid = a.businessid -- same business     , ax.validto > current_date     -- same selection     , ax.type = a.type              -- same address type     , acx.type = ac.type  -- same component     , acx.text = ac.text  -- same value     , ax.id <> a.id       -- *different* addres!!!     ) order b.id, a.id, ac.id     ;

result:

 id | aid |   atype    | acid |   actype    |       actext         ----+-----+------------+------+-------------+---------------------   1 |   1 | electronic |    1 | email       | example@example.com   1 |   2 | electronic |    2 | email       | example@example.com   1 |   4 | visiting   |    6 | street      | road number 2   1 |   4 | visiting   |    7 | postal code | 1234   1 |   4 | visiting   |    8 | postal area | place   1 |   5 | visiting   |   11 | street      | road number 2   1 |   5 | visiting   |   12 | postal code | 1234   1 |   5 | visiting   |   13 | postal area | place (8 rows)

the rest matter of aggregation (basically lose addresscomponents, 1:n related address)

note: current_date , current_timstamp supposed ansi sql. now() , getdate() implemention specific extentions.

Search This Blog

Alcombright

sql server - Find (and possibly remove) duplicate entries in not straight forward table structure -

Comments

Post a Comment

Popular posts from this blog

c# SetCompatibleTextRenderingDefault must be called before the first -

c++ - Fill runtime data at compile time with templates -

C#.NET Oracle.ManagedDataAccess ConfigSchema.xsd -