找到你要的答案

Q:Per-value statistics over table in PostgreSQL

Q:每值统计表在PostgreSQL

I'm trying to generate statistics per value for a column in a table, generating the value itself, number of occurrences of that value in the table, and the % of that from the total.

I have a table such as in the following example:

                Table "public.films"
       Column |         Type          | Modifiers
      --------+-----------------------+-----------
       code   | character(5)          |
       title  | character varying(40) |


      # select * from films;
       code  | title
      -------+-------
       a1123 | yo1
       a1124 | yo1
       a1125 | yo2
       a110  | yo3
       a110v | yo3
       a1a   | yo3
       a1az  | yo3
      (7 rows)

I tried using rank() and percent_rank() to accomplish this but it didn't work. Expected outcome for the above example would be:

      # select * from films;
       title | title_count | title_percent
      -------+-------------+-------------------
       yo1   | 2           | 28%
       yo2   | 1           | 14%
       yo3   | 4           | 57%

What's the most efficient query to achieve that goal, considering that the table will contain over 100 millions of values? (the column is indexed)

我试图为表中的列生成每个值的统计信息,生成值本身,表中该值的发生次数,以及总值的%。

我有一个表格,如下面的例子:

                Table "public.films"
       Column |         Type          | Modifiers
      --------+-----------------------+-----------
       code   | character(5)          |
       title  | character varying(40) |


      # select * from films;
       code  | title
      -------+-------
       a1123 | yo1
       a1124 | yo1
       a1125 | yo2
       a110  | yo3
       a110v | yo3
       a1a   | yo3
       a1az  | yo3
      (7 rows)

我试着用rank()和percent_rank()完成但没用。对于上述例子的预期结果将是:

      # select * from films;
       title | title_count | title_percent
      -------+-------------+-------------------
       yo1   | 2           | 28%
       yo2   | 1           | 14%
       yo3   | 4           | 57%

什么是最有效的查询,以实现这一目标,考虑到表将包含超过100百万的价值观?(列索引)

answer1: 回答1:

This should help:

SELECT title,
       COUNT(*) AS title_count,
       ROUND(COUNT(*) / SUM(COUNT(*)) OVER () * 100) AS percent
  FROM films
 GROUP
    BY title
 ORDER
    BY title

这应该帮助:

SELECT title,
       COUNT(*) AS title_count,
       ROUND(COUNT(*) / SUM(COUNT(*)) OVER () * 100) AS percent
  FROM films
 GROUP
    BY title
 ORDER
    BY title
answer2: 回答2:

And yet another one:

WITH code_cnt AS (
    SELECT title, count(*) AS title_count
    FROM films
    GROUP BY title),
  gt AS (
    SELECT sum(title_count) AS grand_total
    FROM code_cnt)
SELECT title, title_count, (100 * title_count / grand_total) AS title_percent
FROM code_cnt, gt
ORDER BY title;

This version avoids the use of count(*) on the entire table, which is a performance issue when the table is large. (Note that the first three answers calculate both all records in the entire table and then separately for each group.)

又一个:

WITH code_cnt AS (
    SELECT title, count(*) AS title_count
    FROM films
    GROUP BY title),
  gt AS (
    SELECT sum(title_count) AS grand_total
    FROM code_cnt)
SELECT title, title_count, (100 * title_count / grand_total) AS title_percent
FROM code_cnt, gt
ORDER BY title;

此版本避免了使用计数(*)在整个表,这是一个性能问题时,表是大。(注意,前三个答案计算整个表中的所有记录,然后分别为每个组计算)。

answer3: 回答3:

Not sure if this is the most efficient query though:

with total (total_count) as (
  select count(*)::numeric
  from films
)
select title, 
       count(*) as title_count, 
       (count(*)::numeric / (select total_count from total)) * 100  as title_percent
from films
group by title
order by title;

不确定这是否是最有效的查询:

with total (total_count) as (
  select count(*)::numeric
  from films
)
select title, 
       count(*) as title_count, 
       (count(*)::numeric / (select total_count from total)) * 100  as title_percent
from films
group by title
order by title;
answer4: 回答4:
SELECT title
    ,title_count
    ,(
        (
            title_count / (
                SELECT count(*)::NUMERIC
                FROM films
                )
            ) * 100
        )::INT title_percent
FROM (
    SELECT title
        ,count(title)::NUMERIC title_count
    FROM films
    GROUP BY title
    ORDER BY title
    ) t;

Result :

title title_count title_percent 
----- ----------- ------------- 
yo1   2           29            
yo2   1           14            
yo3   4           57   

SELECT title
    ,title_count
    ,(
        (
            title_count / (
                SELECT count(*)::NUMERIC
                FROM films
                )
            ) * 100
        )::INT::TEXT || '%' title_percent
FROM (
    SELECT title
        ,count(title)::NUMERIC title_count
    FROM films
    GROUP BY title
    ORDER BY title
    ) t;

Result :

title title_count title_percent 
----- ----------- ------------- 
yo3   4           57%           
yo1   2           29%           
yo2   1           14%  
SELECT title
    ,title_count
    ,(
        (
            title_count / (
                SELECT count(*)::NUMERIC
                FROM films
                )
            ) * 100
        )::INT title_percent
FROM (
    SELECT title
        ,count(title)::NUMERIC title_count
    FROM films
    GROUP BY title
    ORDER BY title
    ) t;

结果:

title title_count title_percent 
----- ----------- ------------- 
yo1   2           29            
yo2   1           14            
yo3   4           57   

SELECT title
    ,title_count
    ,(
        (
            title_count / (
                SELECT count(*)::NUMERIC
                FROM films
                )
            ) * 100
        )::INT::TEXT || '%' title_percent
FROM (
    SELECT title
        ,count(title)::NUMERIC title_count
    FROM films
    GROUP BY title
    ORDER BY title
    ) t;

结果:

title title_count title_percent 
----- ----------- ------------- 
yo3   4           57%           
yo1   2           29%           
yo2   1           14%  
sql  postgresql