找到你要的答案

Q:Why are type-safe relational operations so difficult?

Q:为什么类型安全的关系操作如此困难?

I was trying to code a relational problem in Haskell, when I had to find out that doing this in a type safe manner is far from obvious. E.g. a humble

select 1,a,b, from T

already raises a number of questions:

  • what is the type of this function?
  • what is the type of the projection 1,a,b ? What is the type of a projection in general?
  • what is the result type and how do I express the relationship between the result type and the projection?
  • what is the type of such a function which accepts any valid projection?
  • how can I detect invalid projections at compile time ?
  • How would I add a column to a table or to a projection?

I believe even Oracle's PL/SQL language does not get this quite right. While invald projections are mostly detected at compile time, the is a large number of type errors which only show at runtime. Most other bindings to RDBMSs (e.g. Java's jdbc and perl's DBI) use SQL contained in Strings and thus give up type-safety entirely.

Further research showed that there are some Haskell libraries (HList, vinyl and TRex), which provide type-safe extensible records and some more. But these libraries all require Haskell extensions like DataKinds, FlexibleContexts and many more. Furthermore these libraries are not easy to use and have a smell of trickery, at least to uninitialized observers like me.

This suggests, that type-safe relational operations do not fit in well with the functional paradigm, at least not as it is implemented in Haskell.

My questions are the following:

  • What are the fundamental causes of this difficulty to model relational operations in a type safe way. Where does Hindley-Milner fall short? Or does the problem originate at typed lambda calculus already?
  • Is there a paradigm, where relational operations are first class citizens? And if so, is there a real-world implementation?

我想代码的关系问题在Haskell中,当我发现在一个类型安全的方式,这样做并不明显。例如,一个卑微的

select 1,a,b, from T

已经提出了一些问题:

  • what is the type of this function?
  • what is the type of the projection 1,a,b ? What is the type of a projection in general?
  • what is the result type and how do I express the relationship between the result type and the projection?
  • what is the type of such a function which accepts any valid projection?
  • how can I detect invalid projections at compile time ?
  • How would I add a column to a table or to a projection?

我相信即使Oracle PL/SQL语言没有得到这很正确。而invald预测大都是在编译时检测,是大量的类型错误,只有在运行时显示。大多数其他绑定到RDBMSs(如java的jdbc和Perl的DBI)使用SQL中的字符串,从而放弃式安全完全。

进一步的研究表明,有一些Haskell库(HLIST,乙烯基和TREX),提供类型安全的可扩展的记录和一些。但这些库都需要像datakinds Haskell的扩展,FlexibleContexts和更多。此外,这些库是不容易使用,闻闻挂羊头卖狗肉,至少对未初始化的观察家们喜欢我。

这表明,这种安全关系操作不符合功能范式,至少不是在Haskell实现。

我的问题如下:

  • What are the fundamental causes of this difficulty to model relational operations in a type safe way. Where does Hindley-Milner fall short? Or does the problem originate at typed lambda calculus already?
  • Is there a paradigm, where relational operations are first class citizens? And if so, is there a real-world implementation?
answer1: 回答1:

Let's define a table indexed on some columns as a type with two type parameters:

data IndexedTable k v = ???

groupBy :: (v -> k) -> IndexedTable k v

-- A table without an index just has an empty key
type Table = IndexedTable ()

k will be a (possibly nested) tuple of all columns that the table is indexed on. v will be a (possibly nested) tuple of all columns that the table is not indexed on.

So, for example, if we had the following table

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

... and it were indexed on the first column, then the type would be:

type Id = Int
type FirstName = String
type LastName = String

IndexedTable Int (FirstName, LastName)

However, if it were indexed on the first and second column, then the type would be:

IndexedTable (Int, Firstname) LastName

Table would implement the Functor, Applicative, and Alternative type classes. In other words:

instance Functor (IndexedTable k)

instance Applicative (IndexedTable k)

instance Alternative (IndexedTable k)

So joins would be implemented as:

join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2

leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)

rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2

Then you would have a separate type that we will call a Select. This type will also have two type parameters:

data Select v r = ???

A Select would consume a bunch of rows of type v from the table and produce a result of type r. In other words, we should have a function of type:

selectIndexed :: Indexed k v -> Select v r -> r

Some example Selects that we might define would be:

count   :: Select v Integer
sum     :: Num a => Select a a
product :: Num a => Select a a
max     :: Ord a => Select a a

This Select type would implement the Applicative interface, so we could combine multiple Selects into a single Select. For example:

liftA2 (,) count sum :: Select Integer (Integer, Integer)

That would be analogous to this SQL:

SELECT COUNT(*), SUM(*)

However, often our table will have multiple columns, so we need a way to focus a Select onto a single column. Let's call this function Focus:

focus :: Lens' a b -> Select b r -> Select a r

So that we can write things like:

liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
  :: (Num a, Num b, Ord c)
  => Select (a, b, c) (a, b, c)

So if we wanted to write something like:

SELECT COUNT(*), MAX(firstName) FROM t

That would be equivalent to this Haskell code:

firstName :: Lens' Row String

table :: Table Row

select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)

So you might wonder how one might implement Select and Table.

I describe how to implement Table in this post:

http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html

... and you can implement Select as just:

type Select = Control.Foldl.Fold

type focus = Control.Foldl.pretraverse

-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t

Also, keep in mind that these are not the only ways to implement Table and Select. They are just a simple implementation to get you started and you can generalize them as necessary.

What about selecting columns from a table? Well, you can define:

column :: Select a (Table a)
column = Control.Foldl.list

So if you wanted to do:

SELECT col FROM t

... you would write:

field :: Lens' Row Field

table :: Table Row

select (focus field column) table :: [Field]

The important takeaway is that you can implement a relational API in Haskell just fine without any fancy type system extensions.

让我们将一些列上的索引表定义为具有两种类型参数的类型:

data IndexedTable k v = ???

groupBy :: (v -> k) -> IndexedTable k v

-- A table without an index just has an empty key
type Table = IndexedTable ()

k将是表索引的所有列的一个(可能嵌套)元组。V将是一个(可能嵌套)元组的所有列的表没有索引。

例如,如果我们有以下表格

| Id | First Name | Last Name |
|----|------------|-----------|
|  0 | Gabriel    | Gonzalez  |
|  1 | Oscar      | Boykin    |
|  2 | Edgar      | Codd      |

…并且它在第一列上被索引,然后类型将被:

type Id = Int
type FirstName = String
type LastName = String

IndexedTable Int (FirstName, LastName)

但是,如果它在第一列和第二列上被索引,那么类型将是:

IndexedTable (Int, Firstname) LastName

表将执行函子,应用,和其他类型的类。换句话说:

instance Functor (IndexedTable k)

instance Applicative (IndexedTable k)

instance Alternative (IndexedTable k)

因此,连接将被执行为:

join :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, v2)
join t1 t2 = liftA2 (,) t1 t2

leftJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (v1, Maybe v2)
leftJoin t1 t2 = liftA2 (,) t1 (optional t2)

rightJoin :: IndexedTable k v1 -> IndexedTable k v2 -> IndexedTable k (Maybe v1, v2)
rightJoin t1 t2 = liftA2 (,) (optional t1) t2

然后,您将有一个单独的类型,我们将调用一个选择。这种类型也将有两种类型的参数:

data Select v r = ???

一个选择将从表格中消耗一系列的V类型,并产生类型r的结果。换句话说,我们应该具有类型的函数:

selectIndexed :: Indexed k v -> Select v r -> r

我们可能定义的一些示例选择将是:

count   :: Select v Integer
sum     :: Num a => Select a a
product :: Num a => Select a a
max     :: Ord a => Select a a

这个选择将执行的应用接口,所以我们可以结合成一个单一的选择多个选择。例如:

liftA2 (,) count sum :: Select Integer (Integer, Integer)

这就类似于SQL:

SELECT COUNT(*), SUM(*)

然而,我们的表通常会有多个列,所以我们需要一个方法来集中选择到一个单独的列。让我们称这个函数为焦点:

focus :: Lens' a b -> Select b r -> Select a r

这样我们就可以写出像:

liftA3 (,,) (focus _1 sum) (focus _2 product) (focus _3 max)
  :: (Num a, Num b, Ord c)
  => Select (a, b, c) (a, b, c)

所以如果我们想写点什么:

SELECT COUNT(*), MAX(firstName) FROM t

这就相当于这个Haskell代码:

firstName :: Lens' Row String

table :: Table Row

select table (liftA2 (,) count (focus firstName max)) :: (Integer, String)

所以你可能想知道如何实现选择和表。

我描述了如何在这篇文章中实现表格:

http://www.haskellforall.com/2014/12/a-very-general-api-for-relational-joins.html

…并且可以实现选择为:

type Select = Control.Foldl.Fold

type focus = Control.Foldl.pretraverse

-- Assuming you define a `Foldable` instance for `IndexedTable`
select t s = Control.Foldl.fold s t

此外,请记住,这不是唯一的方法来实现表和选择。他们只是一个简单的实现,让你开始,你可以概括为必要。

从表中选择列如何?好吧,你可以定义:

column :: Select a (Table a)
column = Control.Foldl.list

所以如果你想做:

SELECT col FROM t

…你会写:

field :: Lens' Row Field

table :: Table Row

select (focus field column) table :: [Field]

最重要的外卖是,你可以实现关系的API在Haskell就好了没有任何花哨的类型系统的扩展。

haskell  types  relational-database  theory  hindley-milner