Oracle统计分析函数

Oracle statistical analysis function

1. 环境与参考

1.1. 引用文档

Oracle sql language reference 11.2 (E41084) - (SQL_Language_Reference_11.2_e41084.pdf)

1.2. 样例数据

Table name: XXX

col_1
H3.1-K27(me2)K36(me0)
H3.1-K27(me3)K36(me0)
H3-LVR
H3K9(me1)K14(ac1)
H3.1-K27(me2)K36(me1)
H3.1-K27(me3)K36(me1)
H3.1-K27(me0)K36(me2)
H3.1-K27(me0)K36(me3)
H3K9(me1)K14(ac0)
H3.1-K27(ac1)K36(me1)
H3.1-K27(ac1)K36(me0)
H3.1-K27(me1)K36(me2)
H3.1-K27(me1)K36(me3)
H3.1-K27(me0)K36(me0)
H3-YRPGTVALR
H3.1-K27(me1)K36(me1)
H3.1-K27(me1)K36(me0)
H3.1-K27(me0)K36(me1)
H3K79(me2)
H3K79(me3)
H3K79(me1)

2. 正则表达式解析函数

2.1. 解析规则需求

 1# The PTM label format in general is:
 2Ha.b-Ac(Lx)Ay(Lz)
 3
 4# Where:
 5# a, b, c, x, y, z are all numbers, L in braces can be ‘ac’ or ‘me’, A can be K, R, or S
 6
 7----------------------------------------
 8|a: 3                |y: 0, 14, 36, 23 |
 9----------------------------------------
10|b: 0, 1, 3	         |z: 0, 1, 2, 3    |
11----------------------------------------
12|c: 4, 9 ,27, 79, 18 |L: ac, me        |
13----------------------------------------
14|x: 0, 1, 2, 3	     |A: K, R, S       |
15----------------------------------------
16
17# Constrains
18--------------------------------------------
19|c = 9, y = 14  |L = ac, x, z = 0, 1       |
20--------------------------------------------
21|c = 27, y = 36 |L = me, x, z = 0, 1, 2, 3 |
22--------------------------------------------
23|c = 18, y = 23 |                          |
24--------------------------------------------

2.2. SQL语句编写

 1# Oracle PL/SQL
 2
 3select REGEXP_SUBSTR(x.col_1, '^H3\.[0-3]{1}|^H3') as P1,
 4       REGEXP_SUBSTR(x.col_1, '(K|R|S)(4|9|27|79|18)|EIR|LVR') as P2,
 5       REGEXP_SUBSTR(x.col_1, '(\((ac|me))', 1, 1, 'i', 2) as P3,
 6       REGEXP_SUBSTR(x.col_1, '(\((ac|me)(\d*))', 1, 1, 'i', 3) as P4,
 7       REGEXP_SUBSTR(x.col_1, '(K|R|S)(0|14|36|23)') as P5,
 8       REGEXP_SUBSTR(x.col_1, '(\((ac|me))', 1, 2, 'i', 2) as P6,
 9       REGEXP_SUBSTR(x.col_1, '(\((ac|me)(\d*))', 1, 2, 'i', 3) as P7,
10       x.col_1 as type_name
11  from XXX x

2.3. 语法说明

REGEXP_SUBSTR(source_char, pattern, position, occurrence, match_parameter, subexpr)

(详细语法参见 E4108, 5-224页).

  • 基本语法
1REGEXP_SUBSTR(x.col_1, '^H3\.[0-3]{1}|^H3')
2REGEXP_SUBSTR(x.col_1, '(K|R|S)(4|9|27|79|18)|EIR|LVR')
3REGEXP_SUBSTR(x.col_1, '(K|R|S)(0|14|36|23)')

使用一般语法正则表达式匹配并提取字符串。

  • 进阶语法
1REGEXP_SUBSTR(x.col_1, '(\((ac|me))', 1, 1, 'i', 2)
2REGEXP_SUBSTR(x.col_1, '(\((ac|me)(\d*))', 1, 1, 'i', 3)
3REGEXP_SUBSTR(x.col_1, '(\((ac|me))', 1, 2, 'i', 2)
4REGEXP_SUBSTR(x.col_1, '(\((ac|me)(\d*))', 1, 2, 'i', 3)

使用一般语法正则表达式匹配,通过后续参数提取字符串。

  1. 参数position: 指定被匹配字符串的开始下标。
  2. 参数occurrence: 指定正则表达式匹配的次序,即第occurrence次被匹配后才认为是有效匹配。
  3. 参数match_parameter: 命中细则,本例中未使用该特性,均为默认的"i",即case-insensitive(大小写不敏感)。
  4. 参数subexpr: 指定使用正则表达式子查询的匹配次序来提取字符串,即第subexpr个子查询来提取字符串,如(((ac|me))’, 1, 1, ‘i’, 2)则表示,匹配后通过(ac|me)部分来提取字符串。

3. 窗口统计函数

3.1. 样例数据与SQL

 1 select distinct AA,
 2                  count(x.aa) OVER(PARTITION BY x.aa ORDER BY x.aa RANGE UNBOUNDED PRECEDING) AS P1C,
 3                  sum(x.aa) OVER(PARTITION BY x.aa ORDER BY x.aa RANGE UNBOUNDED PRECEDING) AS P1S,
 4                  BB,
 5                  count(x.bb) OVER(PARTITION BY x.bb ORDER BY x.bb RANGE UNBOUNDED PRECEDING) AS P2C,
 6                  sum(x.bb) OVER(PARTITION BY x.bb ORDER BY x.bb RANGE UNBOUNDED PRECEDING) AS P2S,
 7                  CC,
 8                  count(x.cc) OVER(PARTITION BY x.cc ORDER BY x.cc RANGE UNBOUNDED PRECEDING) AS P3C,
 9                  sum(x.cc) OVER(PARTITION BY x.cc ORDER BY x.cc RANGE UNBOUNDED PRECEDING) AS P3S,
10                  sum(x.cc) OVER(PARTITION BY x.aa ORDER BY x.aa RANGE UNBOUNDED PRECEDING) AS P3S_BY_AA,
11                  avg(x.cc) OVER(PARTITION BY x.aa ORDER BY x.aa RANGE UNBOUNDED PRECEDING) AS P3S_BY_AA
12    from (select 1 as aa, 10 as bb, 100 as cc from dual
13          union all
14          select 1 as aa, 10 as bb, 200 as cc from dual
15          union all
16          select 1 as aa, 20 as bb, 300 as cc from dual
17          union all
18          select 1 as aa, 20 as bb, 400 as cc from dual) x
19   order by bb, cc

Results:

AAP1CP1SBBP2CP2SCCP3CP3SP3S_BY_AAP3A_BY_AA
1441022010011001000250
1441022020012001000250
1442024030013001000250
1442024040014001000250

3.2. 语法说明

analytic_function OVER(PARTITION BY “Column define” ORDER BY “Column define” RANGE UNBOUNDED PRECEDING)

(详细语法参见 E4108, 5-11页).

  • 使用说明
  1. analytic_function: 一般统计函数皆可以使用,诸如SUM、AVG、COUNT等等。
  2. OVER()子句: 声明该函数为窗口调用函数,即该函数作用于fetch动作之后,因此若数据量巨大时应有选择的使用。
  3. PARTITION BY “Column define” 子句: 该子句定义统计分析函数将根据参数“Column define”指定的列来筛选数据。
  4. ORDER BY “Column define” 子句: 该子句定义统计分析函数执行时将根据参数“Column define”指定的列来排列数据。
  5. RANGE UNBOUNDED PRECEDING 子句: 该子句定义统计分析函数统计时对数据的边界认定原则。(本例中的UNBOUNDED PRECEDING表明,如统计数据变化将重新开始统计)
  6. 本例具有特定数据特定场景,统计函数详细语法与功能应参考E4108, 5-11页的语法说明。
作者|Author: RockSolid
发表日期|Publish Date: Nov 11, 2014
修改日期|Modified Date: Nov 11, 2014
版权许可|Copyright License: CC BY-NC-ND 3.0 CN