Matt Parsons talks about a few ways to encode sum types in relational databases in his blog.
Not so long ago I was thinking about the same problem and came up with slightly different solutions. Let’s start with an example Matt had used:
data Animal
= Cat Name Age FavoriteFood
| Dog Name OwnerId
| Bird Name Song
Matt shows us three ways of dealing with it. Let’s do a recap to have a better picture:
Dependent tables (which have a 1-to-1 correspondence to constructors) refer to the primary table (which corresponds to the Haskell data type). To make sure it’s impossible to have multiple references to the same row of the primary table we add a column containing constructor names to the primary key of the primary table. At the same time, foreign keys of dependent tables have a similar columns, but in each table the column has only one possible value. Same identifiers should be used for both tables.
persistent
approachThe primary table refers to the dependent tables. Identifiers of these tables are generated separately and are different. Only one value of the foreign key can be not null.
Here a denormalized representation of the data is used. All data goes into the the same table, constructor is stored in one of the columns, check constraints are used to limit the range of values we can assign according to the constructor.
I’m proposing two other designs to model sum-types in relational databases.
In his example, Matt uses sum-of-products ADTs we use in haskell, rather than a pure sum-type. Let’s take a look a pure sum first:
data Cat = Cat Name Age FavoriteFood
data Dog = Dog Name OwnerId
data Bird = Bird Name Song
data Animal = ACat Cat | ADog Dog | ABird Bird
For this example, I argue that the persistent
approach is a better fit overall. But I propose a few changes:
A few benefits of this approach comparing to the persistent
one is:
create type unit as enum ('()');
create table cat (
id integer not null,
tag unit not null default '()',
name text not null,
age int not null,
food text not null,
primary key (id,tag)
);
create table dog (
id integer not null,
tag unit not null default '()',
name text not null,
owner_id int not null,
primary key (id,tag)
);
create table bird (
id integer not null,
tag unit not null default '()',
name text not null,
song text not null,
primary key (id,tag)
);
create table animal (
id integer primary key,
cat unit,
dog unit,
bird unit,
check
( (cat is not null and dog is null and bird is null)
or (cat is null and dog is not null and bird is null)
or (cat is null and dog is null and bird is not null) ),
constraint animal__cat foreign key (id,cat) references cat (id,tag),
constraint animal__dog foreign key (id,dog) references dog (id,tag),
constraint animal__bird foreign key (id,bird) references bird (id,tag));
When inserting we have to insert data first into the dependent table, and then into the primary table.
When deleting we have to do the opposite.
So, what’s the difference between normalized and denormalized sums and what is unsatisfactory in the proposed implementation?
In the normalized implementation, rows in the dependent table can exist on their own, without anything referencing them from the primary table. It’s possible to remove data from the primary table and accidentally forget to remove the correspnding data from the dependent table. Bulk deletion scenarios are not obvious as well. An even worse scenario can arise if we consider cascade deletion of some data in the primary table. This “disjointedness” of the dependent data is a very close approximation of the pure sum structure, but from the POV of database design it’s not the best representation possible. The same problem is exhibited in the persistent
approach. For the “Shared Primary Key” approach the situation is reversed — data in the primary table can exist on its own. In other words, in the persistent
approach the identifiers in the primary table are a subset of the union of the identifiers in the dependent tables, and in the “Shared Primary Key” approach — a superset.
Let’s just add two-way references! But how do we insert and delete the data in this scenario? Which table should we start with?
The answer lies in a fairly standard database practice which is called deferred constraints, which allows us to delay the checks until the end of the transaction, so we get a bijective relation between the primary table and the union of dependent tables.
create type unit as enum ('()');
create table cat (
id integer not null,
tag unit not null default '()',
name text not null,
age int not null,
food text not null,
primary key (id,tag)
);
create table dog (
id integer not null,
tag unit not null default '()',
name text not null,
owner_id int not null,
primary key (id,tag)
);
create table bird (
id integer not null,
tag unit not null default '()',
name text not null,
song text not null,
primary key (id,tag)
);
create table animal (
id integer primary key,
cat unit,
dog unit,
bird unit,
check
( (cat is not null and dog is null and bird is null)
or (cat is null and dog is not null and bird is null)
or (cat is null and dog is null and bird is not null) ),
constraint animal__cat foreign key (id,cat) references cat (id,tag)
deferrable initially deferred, -- !!!
constraint animal__dog foreign key (id,dog) references dog (id,tag)
deferrable initially deferred, -- !!!
constraint animal__bird foreign key (id,bird) references bird (id,tag)
deferrable initially deferred); -- !!!
---------- !!! ----------------|||
alter table cat add constraint cat__animal
foreign key (id) references animal (id) on delete cascade;
alter table dog add constraint dog__animal
foreign key (id) references animal (id) on delete cascade;
alter table bird add constraint bird__animal
foreign key (id) references animal (id) on delete cascade;
All the differences of this implementation are highlighted with comments.
In this implementation we have to insert data starting with the primary table (which in my opinion is more natural). To delete the data, we just delete from the primary table, and then cascade deletion removes data from dependent tables.
begin;
insert into animal(id,cat) values (1,'()');
insert into cat(id,name,age,food) values (1,'kitty',1,'milk');
insert into animal(id,dog) values (2,'()');
insert into dog(id,name,owner_id) values (2,'dolly',5);
insert into animal(id,bird) values (3,'()');
insert into bird(id,name,song) values (3,'nightingale','twist');
commit;
delete from animal where id in (1,2)
Here, if desired we can also move the name
column to the primary table.
Thanks for reading!
We can use the fact that PostgreSQL has a support for compound foreign keys parts of which can be null, in which case the entire key is considered to be null. This is called “MATCH SIMPLE” and it’s used by default.↩︎
Typeable OU ("us", "we", or "our") operates https://typeable.io (the "Site"). This page informs you of our policies regarding the collection, use and disclosure of Personal Information we receive from users of the Site.
We use your Personal Information only for providing and improving the Site. By using the Site, you agree to the collection and use of information in accordance with this policy.
While using our Site, we may ask you to provide us with certain personally identifiable information that can be used to contact or identify you. Personally identifiable information may include, but is not limited to your name ("Personal Information").
Like many site operators, we collect information that your browser sends whenever you visit our Site ("Log Data").
This Log Data may include information such as your computer's Internet Protocol ("IP") address, browser type, browser version, the pages of our Site that you visit, the time and date of your visit, the time spent on those pages and other statistics.
In addition, we may use third party services such as Google Analytics that collect, monitor and analyze this ...
Cookies are files with small amount of data, which may include an anonymous unique identifier. Cookies are sent to your browser from a web site and stored on your computer's hard drive.
Like many sites, we use "cookies" to collect information. You can instruct your browser to refuse all cookies or to indicate when a cookie is being sent. However, if you do not accept cookies, you may not be able to use some portions of our Site.
The security of your Personal Information is important to us, so we don't store any personal information and use third-party GDPR-compliant services to store contact data supplied with a "Contact Us" form and job applications data, suplied via "Careers" page.
This Privacy Policy is effective as of @@privacePolicyDate and will remain in effect except with respect to any changes in its provisions in the future, which will be in effect immediately after being posted on this page.
We reserve the right to update or change our Privacy Policy at any time and you should check this Privacy Policy periodically. Your continued use of the Service after we post any modifications to the Privacy Policy on this page will constitute your acknowledgment of the modifications and your consent to abide and be bound by the modified Privacy Policy.
If we make any material changes to this Privacy Policy, we will notify you either through the email address you have provided us, or by placing a prominent notice on our website.
If you have any questions about this Privacy Policy, please contact us.