【Oracle】数据仓库

news2025/6/11 9:20:21

在这里插入图片描述

个人主页:Guiat
归属专栏:Oracle

在这里插入图片描述

文章目录

  • 1. 数据仓库概述
    • 1.1 为什么需要数据仓库
    • 1.2 Oracle数据仓库架构
    • 1.3 Oracle数据仓库关键技术
  • 2. 数据仓库建模
    • 2.1 维度建模基础
    • 2.2 星形模式设计
    • 2.3 雪花模式设计
    • 2.4 缓慢变化维度(SCD)处理
  • 3. ETL数据集成
    • 3.1 ETL流程设计
    • 3.2 数据抽取(Extract)
    • 3.3 数据转换(Transform)
    • 3.4 数据加载(Load)
  • 4. 物化视图优化
    • 4.1 物化视图基础
    • 4.2 聚合物化视图
    • 4.3 复杂聚合物化视图
    • 4.4 物化视图刷新策略
  • 5. 分区策略
    • 5.1 分区类型选择
    • 5.2 复合分区策略
    • 5.3 分区维护操作
  • 6. 索引策略
    • 6.1 数据仓库索引设计
    • 6.2 位图索引应用
    • 6.3 函数索引优化
  • 7. 查询优化
    • 7.1 星形转换优化
    • 7.2 并行查询优化
    • 7.3 分区消除优化
  • 8. 数据压缩
    • 8.1 表压缩策略
    • 8.2 索引压缩
    • 8.3 压缩效果监控
  • 9. 实时数据仓库
    • 9.1 实时ETL架构
    • 9.2 Oracle GoldenGate集成
    • 9.3 Lambda架构实现
  • 10. 数据仓库治理
    • 10.1 数据质量管理
    • 10.2 元数据管理
    • 10.3 数据血缘追踪

正文

1. 数据仓库概述

数据仓库就像是企业的"数据金库",把散落在各个系统中的数据统一收集起来,经过清洗、整理后,为决策分析提供统一的数据视图。Oracle数据仓库不仅仅是存储数据,更是一个完整的分析平台。

1.1 为什么需要数据仓库

  • 统一数据源,避免"数据孤岛"问题
  • 历史数据保存,支持趋势分析
  • 优化查询性能,专门为分析而设计
  • 提供一致的数据定义和业务规则
  • 支持复杂的分析和报表需求

1.2 Oracle数据仓库架构

Oracle数据仓库采用分层架构,就像盖房子一样,一层一层往上建:

数据源层
数据集成层
数据存储层
数据服务层
数据应用层
OLTP系统
外部数据
文件系统
第三方API
ODI/GoldenGate
ETL工具
数据清洗
数据转换
ODS操作数据存储
数据仓库核心
数据集市
元数据库
OLAP服务
报表服务
数据挖掘
API接口
BI工具
报表平台
自助分析
移动应用

1.3 Oracle数据仓库关键技术

技术组件作用应用场景
分区表提高查询性能和管理效率大表按时间、地区等维度分区
物化视图预计算聚合结果常用的汇总查询
位图索引优化低基数列查询性别、状态等字段
并行处理加速大数据量操作ETL加载和复杂查询
压缩技术节省存储空间历史数据压缩存储
Star Transformation优化星形查询典型的OLAP查询模式

2. 数据仓库建模

2.1 维度建模基础

维度建模就像是搭积木,把复杂的业务用简单的"事实"和"维度"来表达:

维度建模
事实表
维度表
度量值
外键
粒度
属性
层次结构
缓慢变化维度
销售额
数量
利润
产品名称
类别
品牌

2.2 星形模式设计

星形模式是数据仓库中最常用的建模方式:

-- 时间维度表
CREATE TABLE dim_time (
    time_key NUMBER PRIMARY KEY,
    date_value DATE NOT NULL,
    year_number NUMBER(4) NOT NULL,
    quarter_number NUMBER(1) NOT NULL,
    month_number NUMBER(2) NOT NULL,
    month_name VARCHAR2(20) NOT NULL,
    day_number NUMBER(2) NOT NULL,
    day_name VARCHAR2(20) NOT NULL,
    week_number NUMBER(2) NOT NULL,
    is_weekend CHAR(1) DEFAULT 'N',
    is_holiday CHAR(1) DEFAULT 'N',
    fiscal_year NUMBER(4),
    fiscal_quarter NUMBER(1),
    fiscal_month NUMBER(2)
) COMPRESS FOR OLTP;

-- 产品维度表
CREATE TABLE dim_product (
    product_key NUMBER PRIMARY KEY,
    product_id VARCHAR2(20) NOT NULL,
    product_name VARCHAR2(200) NOT NULL,
    brand VARCHAR2(100),
    category VARCHAR2(100) NOT NULL,
    subcategory VARCHAR2(100),
    unit_cost NUMBER(10,2),
    unit_price NUMBER(10,2),
    product_status VARCHAR2(20) DEFAULT 'ACTIVE',
    created_date DATE DEFAULT SYSDATE,
    updated_date DATE DEFAULT SYSDATE
) COMPRESS FOR OLTP;

-- 客户维度表
CREATE TABLE dim_customer (
    customer_key NUMBER PRIMARY KEY,
    customer_id VARCHAR2(20) NOT NULL,
    customer_name VARCHAR2(200) NOT NULL,
    customer_type VARCHAR2(50),
    segment VARCHAR2(50),
    region VARCHAR2(100),
    country VARCHAR2(100),
    city VARCHAR2(100),
    postal_code VARCHAR2(20),
    registration_date DATE,
    credit_limit NUMBER(12,2),
    customer_status VARCHAR2(20) DEFAULT 'ACTIVE'
) COMPRESS FOR OLTP;

-- 门店维度表
CREATE TABLE dim_store (
    store_key NUMBER PRIMARY KEY,
    store_id VARCHAR2(20) NOT NULL,
    store_name VARCHAR2(200) NOT NULL,
    store_type VARCHAR2(50),
    region VARCHAR2(100),
    district VARCHAR2(100),
    city VARCHAR2(100),
    address VARCHAR2(500),
    manager_name VARCHAR2(100),
    opening_date DATE,
    store_size NUMBER(10,2),
    store_status VARCHAR2(20) DEFAULT 'OPEN'
) COMPRESS FOR OLTP;

-- 销售事实表(分区表)
CREATE TABLE fact_sales (
    time_key NUMBER NOT NULL,
    product_key NUMBER NOT NULL,
    customer_key NUMBER NOT NULL,
    store_key NUMBER NOT NULL,
    transaction_id VARCHAR2(50),
    quantity NUMBER(10,2) NOT NULL,
    unit_price NUMBER(10,2) NOT NULL,
    total_amount NUMBER(12,2) NOT NULL,
    discount_amount NUMBER(12,2) DEFAULT 0,
    tax_amount NUMBER(12,2) DEFAULT 0,
    profit_amount NUMBER(12,2),
    cost_amount NUMBER(12,2),
    CONSTRAINT fk_sales_time FOREIGN KEY (time_key) REFERENCES dim_time(time_key),
    CONSTRAINT fk_sales_product FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    CONSTRAINT fk_sales_customer FOREIGN KEY (customer_key) REFERENCES dim_customer(customer_key),
    CONSTRAINT fk_sales_store FOREIGN KEY (store_key) REFERENCES dim_store(store_key)
)
COMPRESS FOR QUERY LOW
PARTITION BY RANGE (time_key) (
    PARTITION p2023q1 VALUES LESS THAN (20230401),
    PARTITION p2023q2 VALUES LESS THAN (20230701),
    PARTITION p2023q3 VALUES LESS THAN (20231001),
    PARTITION p2023q4 VALUES LESS THAN (20240101),
    PARTITION p2024q1 VALUES LESS THAN (20240401)
);

2.3 雪花模式设计

雪花模式是星形模式的规范化版本,减少数据冗余:

-- 产品类别表(雪花模式的一部分)
CREATE TABLE dim_product_category (
    category_key NUMBER PRIMARY KEY,
    category_id VARCHAR2(20) NOT NULL,
    category_name VARCHAR2(100) NOT NULL,
    parent_category_key NUMBER,
    category_level NUMBER(2),
    description VARCHAR2(500)
);

-- 产品品牌表
CREATE TABLE dim_product_brand (
    brand_key NUMBER PRIMARY KEY,
    brand_id VARCHAR2(20) NOT NULL,
    brand_name VARCHAR2(100) NOT NULL,
    country_of_origin VARCHAR2(100),
    brand_website VARCHAR2(200)
);

-- 修改后的产品维度表(引用类别和品牌表)
CREATE TABLE dim_product_snowflake (
    product_key NUMBER PRIMARY KEY,
    product_id VARCHAR2(20) NOT NULL,
    product_name VARCHAR2(200) NOT NULL,
    brand_key NUMBER,
    category_key NUMBER,
    unit_cost NUMBER(10,2),
    unit_price NUMBER(10,2),
    CONSTRAINT fk_product_brand FOREIGN KEY (brand_key) REFERENCES dim_product_brand(brand_key),
    CONSTRAINT fk_product_category FOREIGN KEY (category_key) REFERENCES dim_product_category(category_key)
);

2.4 缓慢变化维度(SCD)处理

缓慢变化维度是数据仓库中的经典问题,客户信息、产品信息会随时间变化:

-- SCD Type 1: 直接覆盖(不保留历史)
UPDATE dim_customer 
SET customer_name = '新公司名称',
    updated_date = SYSDATE
WHERE customer_id = 'CUST001';

-- SCD Type 2: 保留历史版本
CREATE TABLE dim_customer_scd2 (
    customer_key NUMBER PRIMARY KEY,
    customer_id VARCHAR2(20) NOT NULL,
    customer_name VARCHAR2(200) NOT NULL,
    customer_type VARCHAR2(50),
    region VARCHAR2(100),
    effective_date DATE NOT NULL,
    expiry_date DATE DEFAULT DATE '9999-12-31',
    is_current CHAR(1) DEFAULT 'Y',
    version_number NUMBER DEFAULT 1
);

-- 处理SCD Type 2的存储过程
CREATE OR REPLACE PROCEDURE update_customer_scd2(
    p_customer_id VARCHAR2,
    p_new_name VARCHAR2,
    p_new_type VARCHAR2,
    p_new_region VARCHAR2
) AS
    v_customer_key NUMBER;
BEGIN
    -- 关闭当前记录
    UPDATE dim_customer_scd2 
    SET expiry_date = SYSDATE - 1,
        is_current = 'N'
    WHERE customer_id = p_customer_id 
    AND is_current = 'Y';
    
    -- 生成新的键值
    SELECT customer_seq.NEXTVAL INTO v_customer_key FROM DUAL;
    
    -- 插入新记录
    INSERT INTO dim_customer_scd2 (
        customer_key, customer_id, customer_name, 
        customer_type, region, effective_date
    ) VALUES (
        v_customer_key, p_customer_id, p_new_name,
        p_new_type, p_new_region, SYSDATE
    );
    
    COMMIT;
END;
/

-- SCD Type 3: 保留有限历史(增加字段)
CREATE TABLE dim_customer_scd3 (
    customer_key NUMBER PRIMARY KEY,
    customer_id VARCHAR2(20) NOT NULL,
    current_name VARCHAR2(200),
    previous_name VARCHAR2(200),
    current_region VARCHAR2(100),
    previous_region VARCHAR2(100),
    last_updated DATE
);

3. ETL数据集成

3.1 ETL流程设计

ETL就像是数据的"流水线",把原始数据加工成高质量的分析数据:

Extract 抽取
Transform 转换
Load 加载
源系统连接
增量抽取
全量抽取
数据清洗
格式转换
业务规则
数据验证
维度加载
事实加载
索引重建
统计信息

3.2 数据抽取(Extract)

-- 创建数据库链接到源系统
CREATE DATABASE LINK source_db
CONNECT TO source_user IDENTIFIED BY source_password
USING 'source_db_tns';

-- 增量抽取策略
CREATE TABLE etl_control (
    table_name VARCHAR2(100) PRIMARY KEY,
    last_extract_time TIMESTAMP,
    last_extract_key NUMBER,
    extract_status VARCHAR2(20),
    updated_date DATE DEFAULT SYSDATE
);

-- 基于时间戳的增量抽取
CREATE OR REPLACE PROCEDURE extract_orders_incremental AS
    v_last_time TIMESTAMP;
    v_current_time TIMESTAMP := SYSTIMESTAMP;
BEGIN
    -- 获取上次抽取时间
    SELECT last_extract_time INTO v_last_time
    FROM etl_control
    WHERE table_name = 'ORDERS';
    
    -- 抽取增量数据
    INSERT INTO staging_orders
    SELECT * FROM orders@source_db
    WHERE last_modified > v_last_time;
    
    -- 更新控制表
    UPDATE etl_control
    SET last_extract_time = v_current_time,
        extract_status = 'SUCCESS',
        updated_date = SYSDATE
    WHERE table_name = 'ORDERS';
    
    COMMIT;
EXCEPTION
    WHEN OTHERS THEN
        UPDATE etl_control
        SET extract_status = 'FAILED'
        WHERE table_name = 'ORDERS';
        COMMIT;
        RAISE;
END;
/

-- 基于CDC(Change Data Capture)的抽取
CREATE TABLE cdc_orders (
    order_id NUMBER,
    customer_id NUMBER,
    order_date DATE,
    total_amount NUMBER,
    operation_type VARCHAR2(10), -- INSERT, UPDATE, DELETE
    change_timestamp TIMESTAMP
);

-- 使用Oracle Streams或GoldenGate进行实时数据捕获
-- 这里展示手工创建触发器的方式
CREATE OR REPLACE TRIGGER orders_cdc_trigger
    AFTER INSERT OR UPDATE OR DELETE ON orders
    FOR EACH ROW
BEGIN
    IF INSERTING THEN
        INSERT INTO cdc_orders VALUES (
            :NEW.order_id, :NEW.customer_id, :NEW.order_date,
            :NEW.total_amount, 'INSERT', SYSTIMESTAMP
        );
    ELSIF UPDATING THEN
        INSERT INTO cdc_orders VALUES (
            :NEW.order_id, :NEW.customer_id, :NEW.order_date,
            :NEW.total_amount, 'UPDATE', SYSTIMESTAMP
        );
    ELSIF DELETING THEN
        INSERT INTO cdc_orders VALUES (
            :OLD.order_id, :OLD.customer_id, :OLD.order_date,
            :OLD.total_amount, 'DELETE', SYSTIMESTAMP
        );
    END IF;
END;
/

3.3 数据转换(Transform)

-- 创建数据清洗和转换的存储过程
CREATE OR REPLACE PROCEDURE transform_customer_data AS
BEGIN
    -- 数据清洗:去除空格、统一大小写
    UPDATE staging_customers
    SET customer_name = TRIM(UPPER(customer_name)),
        email = TRIM(LOWER(email)),
        phone = REGEXP_REPLACE(phone, '[^0-9]', ''); -- 只保留数字
    
    -- 数据验证:标记无效记录
    UPDATE staging_customers
    SET data_quality_flag = 'INVALID'
    WHERE customer_name IS NULL
       OR LENGTH(customer_name) < 2
       OR email NOT LIKE '%@%.%'
       OR LENGTH(phone) NOT BETWEEN 10 AND 15;
    
    -- 业务规则转换:客户分级
    UPDATE staging_customers
    SET customer_segment = 
        CASE 
            WHEN total_purchases > 100000 THEN 'VIP'
            WHEN total_purchases > 50000 THEN 'PREMIUM'
            WHEN total_purchases > 10000 THEN 'STANDARD'
            ELSE 'BASIC'
        END;
    
    -- 地址标准化
    UPDATE staging_customers
    SET region = 
        CASE 
            WHEN UPPER(city) IN ('BEIJING', 'SHANGHAI', 'SHENZHEN', 'GUANGZHOU') THEN 'TIER1'
            WHEN UPPER(city) IN ('HANGZHOU', 'NANJING', 'SUZHOU', 'WUHAN') THEN 'TIER2'
            ELSE 'OTHER'
        END;
    
    COMMIT;
END;
/

-- 复杂的数据转换:销售数据聚合
CREATE OR REPLACE PROCEDURE transform_sales_summary AS
BEGIN
    -- 删除当天的汇总数据(重新计算)
    DELETE FROM fact_sales_daily
    WHERE date_key = TO_NUMBER(TO_CHAR(SYSDATE, 'YYYYMMDD'));
    
    -- 重新计算当天汇总
    INSERT INTO fact_sales_daily (
        date_key, product_key, store_key,
        total_quantity, total_amount, total_profit,
        avg_unit_price, transaction_count
    )
    SELECT 
        TO_NUMBER(TO_CHAR(s.sale_date, 'YYYYMMDD')) as date_key,
        s.product_key,
        s.store_key,
        SUM(s.quantity) as total_quantity,
        SUM(s.total_amount) as total_amount,
        SUM(s.profit_amount) as total_profit,
        AVG(s.unit_price) as avg_unit_price,
        COUNT(*) as transaction_count
    FROM staging_sales s
    WHERE TRUNC(s.sale_date) = TRUNC(SYSDATE)
    GROUP BY 
        TO_NUMBER(TO_CHAR(s.sale_date, 'YYYYMMDD')),
        s.product_key,
        s.store_key;
    
    COMMIT;
END;
/

3.4 数据加载(Load)

-- 维度表加载策略
CREATE OR REPLACE PROCEDURE load_dim_customer AS
    v_count NUMBER;
BEGIN
    -- 处理新增客户
    INSERT INTO dim_customer (
        customer_key, customer_id, customer_name, customer_type,
        segment, region, country, customer_status
    )
    SELECT 
        customer_seq.NEXTVAL,
        sc.customer_id,
        sc.customer_name,
        sc.customer_type,
        sc.customer_segment,
        sc.region,
        sc.country,
        'ACTIVE'
    FROM staging_customers sc
    WHERE sc.data_quality_flag != 'INVALID'
    AND NOT EXISTS (
        SELECT 1 FROM dim_customer dc
        WHERE dc.customer_id = sc.customer_id
    );
    
    v_count := SQL%ROWCOUNT;
    DBMS_OUTPUT.PUT_LINE('插入新客户: ' || v_count || ' 条记录');
    
    -- 处理客户信息更新(SCD Type 1)
    UPDATE dim_customer dc
    SET (customer_name, customer_type, segment, region, updated_date) = (
        SELECT sc.customer_name, sc.customer_type, sc.customer_segment, 
               sc.region, SYSDATE
        FROM staging_customers sc
        WHERE sc.customer_id = dc.customer_id
        AND sc.data_quality_flag != 'INVALID'
    )
    WHERE EXISTS (
        SELECT 1 FROM staging_customers sc
        WHERE sc.customer_id = dc.customer_id
        AND sc.data_quality_flag != 'INVALID'
    );
    
    v_count := SQL%ROWCOUNT;
    DBMS_OUTPUT.PUT_LINE('更新客户: ' || v_count || ' 条记录');
    
    COMMIT;
END;
/

-- 事实表批量加载
CREATE OR REPLACE PROCEDURE load_fact_sales AS
BEGIN
    -- 禁用索引以提高加载性能
    EXECUTE IMMEDIATE 'ALTER INDEX idx_fact_sales_time UNUSABLE';
    EXECUTE IMMEDIATE 'ALTER INDEX idx_fact_sales_product UNUSABLE';
    
    -- 批量插入事实数据
    INSERT /*+ APPEND PARALLEL(fact_sales, 4) */ INTO fact_sales (
        time_key, product_key, customer_key, store_key,
        transaction_id, quantity, unit_price, total_amount,
        discount_amount, tax_amount, profit_amount
    )
    SELECT /*+ PARALLEL(ss, 4) */
        TO_NUMBER(TO_CHAR(ss.sale_date, 'YYYYMMDD')) as time_key,
        dp.product_key,
        dc.customer_key,
        ds.store_key,
        ss.transaction_id,
        ss.quantity,
        ss.unit_price,
        ss.total_amount,
        ss.discount_amount,
        ss.tax_amount,
        ss.total_amount - ss.cost_amount as profit_amount
    FROM staging_sales ss
    JOIN dim_product dp ON ss.product_id = dp.product_id
    JOIN dim_customer dc ON ss.customer_id = dc.customer_id
    JOIN dim_store ds ON ss.store_id = ds.store_id
    WHERE ss.data_quality_flag != 'INVALID';
    
    COMMIT;
    
    -- 重建索引
    EXECUTE IMMEDIATE 'ALTER INDEX idx_fact_sales_time REBUILD PARALLEL 4';
    EXECUTE IMMEDIATE 'ALTER INDEX idx_fact_sales_product REBUILD PARALLEL 4';
    
    -- 收集统计信息
    DBMS_STATS.GATHER_TABLE_STATS(
        ownname => USER,
        tabname => 'FACT_SALES',
        estimate_percent => DBMS_STATS.AUTO_SAMPLE_SIZE,
        cascade => TRUE,
        degree => 4
    );
    
END;
/

4. 物化视图优化

4.1 物化视图基础

物化视图就像是"预制菜",把常用的复杂查询结果提前计算好存储起来:

物化视图类型
简单聚合视图
连接聚合视图
嵌套聚合视图
UNION ALL视图
SUM/COUNT/AVG
单表GROUP BY
星形连接
多表聚合
ROLLUP
CUBE
GROUPING SETS
分区视图
历史数据合并

4.2 聚合物化视图

-- 月度销售汇总物化视图
CREATE MATERIALIZED VIEW mv_monthly_sales
BUILD IMMEDIATE
REFRESH FAST ON COMMIT
ENABLE QUERY REWRITE
AS
SELECT 
    t.year_number,
    t.month_number,
    p.category,
    s.region,
    SUM(fs.quantity) as total_quantity,
    SUM(fs.total_amount) as total_sales,
    SUM(fs.profit_amount) as total_profit,
    COUNT(*) as transaction_count,
    COUNT(DISTINCT fs.customer_key) as unique_customers
FROM fact_sales fs
JOIN dim_time t ON fs.time_key = t.time_key
JOIN dim_product p ON fs.product_key = p.product_key
JOIN dim_store s ON fs.store_key = s.store_key
GROUP BY t.year_number, t.month_number, p.category, s.region;

-- 创建物化视图日志(支持快速刷新)
CREATE MATERIALIZED VIEW LOG ON fact_sales
WITH ROWID, SEQUENCE (time_key, product_key, store_key, customer_key, 
                      quantity, total_amount, profit_amount)
INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON dim_time
WITH ROWID, SEQUENCE (time_key, year_number, month_number)
INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON dim_product
WITH ROWID, SEQUENCE (product_key, category)
INCLUDING NEW VALUES;

CREATE MATERIALIZED VIEW LOG ON dim_store
WITH ROWID, SEQUENCE (store_key, region)
INCLUDING NEW VALUES;

4.3 复杂聚合物化视图

-- 多维度分析物化视图(支持ROLLUP和CUBE)
CREATE MATERIALIZED VIEW mv_sales_cube
BUILD IMMEDIATE
REFRESH COMPLETE ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT 
    t.year_number,
    t.quarter_number,
    t.month_number,
    p.category,
    p.brand,
    c.segment,
    s.region,
    SUM(fs.quantity) as total_quantity,
    SUM(fs.total_amount) as total_sales,
    SUM(fs.profit_amount) as total_profit,
    AVG(fs.unit_price) as avg_unit_price,
    COUNT(*) as transaction_count
FROM fact_sales fs
JOIN dim_time t ON fs.time_key = t.time_key
JOIN dim_product p ON fs.product_key = p.product_key
JOIN dim_customer c ON fs.customer_key = c.customer_key
JOIN dim_store s ON fs.store_key = s.store_key
GROUP BY CUBE(
    (t.year_number, t.quarter_number, t.month_number),
    (p.category, p.brand),
    c.segment,
    s.region
);

-- 分区物化视图(提高管理效率)
CREATE MATERIALIZED VIEW mv_sales_by_quarter
PARTITION BY RANGE (year_number, quarter_number) (
    PARTITION p2023q1 VALUES LESS THAN (2023, 2),
    PARTITION p2023q2 VALUES LESS THAN (2023, 3),
    PARTITION p2023q3 VALUES LESS THAN (2023, 4),
    PARTITION p2023q4 VALUES LESS THAN (2024, 1),
    PARTITION p2024q1 VALUES LESS THAN (2024, 2)
)
BUILD IMMEDIATE
REFRESH FAST ON DEMAND
ENABLE QUERY REWRITE
AS
SELECT 
    t.year_number,
    t.quarter_number,
    p.category,
    SUM(fs.total_amount) as total_sales,
    COUNT(*) as transaction_count
FROM fact_sales fs
JOIN dim_time t ON fs.time_key = t.time_key
JOIN dim_product p ON fs.product_key = p.product_key
GROUP BY t.year_number, t.quarter_number, p.category;

4.4 物化视图刷新策略

-- 设置自动刷新作业
BEGIN
    DBMS_REFRESH.MAKE(
        name => 'SALES_MV_GROUP',
        list => 'MV_MONTHLY_SALES,MV_SALES_CUBE',
        next_date => SYSDATE + 1/24, -- 1小时后开始
        interval => 'SYSDATE + 1/24' -- 每小时刷新一次
    );
END;
/

-- 手动刷新物化视图
EXEC DBMS_MVIEW.REFRESH('MV_MONTHLY_SALES', 'F'); -- 快速刷新
EXEC DBMS_MVIEW.REFRESH('MV_SALES_CUBE', 'C');    -- 完全刷新

-- 监控物化视图刷新状态
SELECT 
    mview_name,
    last_refresh_date,
    refresh_method,
    build_mode,
    fast_refreshable,
    compile_state
FROM user_mviews;

-- 查看物化视图日志统计
SELECT 
    log_table,
    log_owner,
    rowids,
    primary_key,
    object_id,
    filter_columns
FROM user_mview_logs;

5. 分区策略

5.1 分区类型选择

分区就像是给数据分类整理,让查询更快,管理更方便:

-- 范围分区(最常用于时间维度)
CREATE TABLE fact_sales_range_partition (
    sale_id NUMBER,
    sale_date DATE,
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY RANGE (sale_date) (
    PARTITION p2023_01 VALUES LESS THAN (DATE '2023-02-01'),
    PARTITION p2023_02 VALUES LESS THAN (DATE '2023-03-01'),
    PARTITION p2023_03 VALUES LESS THAN (DATE '2023-04-01'),
    PARTITION p2023_04 VALUES LESS THAN (DATE '2023-05-01'),
    PARTITION p_max VALUES LESS THAN (MAXVALUE)
);

-- 列表分区(适合离散值)
CREATE TABLE fact_sales_list_partition (
    sale_id NUMBER,
    region VARCHAR2(50),
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY LIST (region) (
    PARTITION p_north VALUES ('NORTH', 'NORTHEAST'),
    PARTITION p_south VALUES ('SOUTH', 'SOUTHEAST'),
    PARTITION p_east VALUES ('EAST'),
    PARTITION p_west VALUES ('WEST', 'SOUTHWEST'),
    PARTITION p_default VALUES (DEFAULT)
);

-- 哈希分区(数据均匀分布)
CREATE TABLE fact_sales_hash_partition (
    sale_id NUMBER,
    customer_id NUMBER,
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY HASH (customer_id) PARTITIONS 8;

-- 间隔分区(自动创建新分区)
CREATE TABLE fact_sales_interval_partition (
    sale_id NUMBER,
    sale_date DATE,
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY RANGE (sale_date)
INTERVAL (NUMTOYMINTERVAL(1, 'MONTH')) (
    PARTITION p_start VALUES LESS THAN (DATE '2023-01-01')
);

5.2 复合分区策略

-- 范围-哈希复合分区
CREATE TABLE fact_sales_range_hash (
    sale_id NUMBER,
    sale_date DATE,
    customer_id NUMBER,
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY RANGE (sale_date)
SUBPARTITION BY HASH (customer_id) SUBPARTITIONS 4 (
    PARTITION p2023q1 VALUES LESS THAN (DATE '2023-04-01'),
    PARTITION p2023q2 VALUES LESS THAN (DATE '2023-07-01'),
    PARTITION p2023q3 VALUES LESS THAN (DATE '2023-10-01'),
    PARTITION p2023q4 VALUES LESS THAN (DATE '2024-01-01')
);

-- 范围-列表复合分区
CREATE TABLE fact_sales_range_list (
    sale_id NUMBER,
    sale_date DATE,
    region VARCHAR2(50),
    product_id NUMBER,
    amount NUMBER
)
PARTITION BY RANGE (sale_date)
SUBPARTITION BY LIST (region) (
    PARTITION p2023 VALUES LESS THAN (DATE '2024-01-01') (
        SUBPARTITION p2023_north VALUES ('NORTH'),
        SUBPARTITION p2023_south VALUES ('SOUTH'),
        SUBPARTITION p2023_east VALUES ('EAST'),
        SUBPARTITION p2023_west VALUES ('WEST')
    ),
    PARTITION p2024 VALUES LESS THAN (DATE '2025-01-01') (
        SUBPARTITION p2024_north VALUES ('NORTH'),
        SUBPARTITION p2024_south VALUES ('SOUTH'),
        SUBPARTITION p2024_east VALUES ('EAST'),
        SUBPARTITION p2024_west VALUES ('WEST')
    )
);

5.3 分区维护操作

-- 添加新分区
ALTER TABLE fact_sales_range_partition
ADD PARTITION p2023_05 VALUES LESS THAN (DATE '2023-06-01');

-- 删除旧分区(数据归档)
ALTER TABLE fact_sales_range_partition
DROP PARTITION p2023_01;

-- 截断分区(清空数据但保留结构)
ALTER TABLE fact_sales_range_partition
TRUNCATE PARTITION p2023_02;

-- 分区交换(快速数据迁移)
-- 创建临时表
CREATE TABLE temp_partition_data AS
SELECT * FROM fact_sales_range_partition WHERE 1=0;

-- 加载数据到临时表
INSERT INTO temp_partition_data 
SELECT * FROM external_data_source;

-- 交换分区
ALTER TABLE fact_sales_range_partition
EXCHANGE PARTITION p2023_03 WITH TABLE temp_partition_data;

-- 分区合并
ALTER TABLE fact_sales_range_partition
MERGE PARTITIONS p2023_01, p2023_02 INTO PARTITION p2023_q1;

-- 分区拆分
ALTER TABLE fact_sales_range_partition
SPLIT PARTITION p_max AT (DATE '2024-01-01')
INTO (PARTITION p2023_12, PARTITION p_max);

6. 索引策略

6.1 数据仓库索引设计

数据仓库的索引策略与OLTP系统有很大不同:

数据仓库索引策略
位图索引
B树索引
函数索引
分区索引
低基数列
AND/OR查询优化
计数查询优化
高基数列
范围查询
排序操作
计算列索引
表达式索引
本地分区索引
全局分区索引

6.2 位图索引应用

-- 为低基数列创建位图索引
CREATE BITMAP INDEX bmp_idx_product_category 
ON fact_sales (product_key) 
LOCAL; -- 分区表使用本地索引

CREATE BITMAP INDEX bmp_idx_customer_segment 
ON fact_sales (customer_key)
LOCAL;

CREATE BITMAP INDEX bmp_idx_time_quarter 
ON fact_sales (time_key)
LOCAL;

-- 位图连接索引(直接索引维度属性)
CREATE BITMAP INDEX bmp_idx_product_category_join
ON fact_sales (p.category)
FROM fact_sales fs, dim_product p
WHERE fs.product_key = p.product_key
LOCAL;

-- 查看位图索引使用情况
SELECT 
    fs.product_key,
    COUNT(*) as sales_count,
    SUM(fs.total_amount) as total_sales
FROM fact_sales fs
WHERE fs.time_key BETWEEN 20230101 AND 20230331  -- 使用位图索引
  AND fs.product_key IN (
      SELECT product_key FROM dim_product 
      WHERE category = 'ELECTRONICS'              -- 位图连接索引
  )
GROUP BY fs.product_key;

6.3 函数索引优化

-- 为常用的日期计算创建函数索引
CREATE INDEX idx_sales_year_month 
ON fact_sales (EXTRACT(YEAR FROM TO_DATE(time_key, 'YYYYMMDD')),
               EXTRACT(MONTH FROM TO_DATE(time_key, 'YYYYMMDD')));

-- 为金额范围查询创建函数索引
CREATE INDEX idx_sales_amount_range 
ON fact_sales (
    CASE 
        WHEN total_amount < 100 THEN 'LOW'
        WHEN total_amount < 1000 THEN 'MEDIUM'
        ELSE 'HIGH'
    END
);

-- 为复杂计算创建函数索引
CREATE INDEX idx_profit_margin 
ON fact_sales ((profit_amount / total_amount * 100));

-- 这些索引支持相应的查询
SELECT product_key, SUM(total_amount)
FROM fact_sales
WHERE EXTRACT(YEAR FROM TO_DATE(time_key, 'YYYYMMDD')) = 2023
  AND EXTRACT(MONTH FROM TO_DATE(time_key, 'YYYYMMDD')) = 6
GROUP BY product_key;

SELECT COUNT(*)
FROM fact_sales
WHERE (profit_amount / total_amount * 100) > 20;

7. 查询优化

7.1 星形转换优化

星形转换是Oracle专门为数据仓库设计的优化技术:

-- 启用星形转换
ALTER SESSION SET star_transformation_enabled = TRUE;

-- 典型的星形查询
SELECT /*+ STAR_TRANSFORMATION */
    p.category,
    t.year_number,
    s.region,
    SUM(fs.total_amount) as total_sales,
    COUNT(*) as transaction_count
FROM fact_sales fs
JOIN dim_product p ON fs.product_key = p.product_key
JOIN dim_time t ON fs.time_key = t.time_key
JOIN dim_store s ON fs.store_key = s.store_key
WHERE p.category IN ('ELECTRONICS', 'CLOTHING')
  AND t.year_number = 2023
  AND s.region = 'NORTH'
GROUP BY p.category, t.year_number, s.region;

-- 查看执行计划
EXPLAIN PLAN FOR
SELECT p.category, SUM(fs.total_amount)
FROM fact_sales fs
JOIN dim_product p ON fs.product_key = p.product_key
WHERE p.category = 'ELECTRONICS';

SELECT * FROM TABLE(DBMS_XPLAN.DISPLAY);

7.2 并行查询优化

-- 设置并行度
ALTER TABLE fact_sales PARALLEL 8;
ALTER TABLE dim_product PARALLEL 4;

-- 使用并行查询
SELECT /*+ PARALLEL(fs, 8) PARALLEL(p, 4) */
    p.category,
    SUM(fs.total_amount) as total_sales,
    AVG(fs.unit_price) as avg_price
FROM fact_sales fs
JOIN dim_product p ON fs.product_key = p.product_key
WHERE fs.time_key BETWEEN 20230101 AND 20231231
GROUP BY p.category;

-- 并行DML操作
ALTER SESSION ENABLE PARALLEL DML;

INSERT /*+ PARALLEL(fact_sales_summary, 4) */ INTO fact_sales_summary
SELECT /*+ PARALLEL(fs, 8) */
    time_key,
    product_key,
    SUM(quantity) as total_quantity,
    SUM(total_amount) as total_amount
FROM fact_sales fs
WHERE time_key BETWEEN 20230101 AND 20230131
GROUP BY time_key, product_key;

-- 监控并行执行
SELECT 
    px_servers_requested,
    px_servers_allocated,
    px_qcsid,
    px_server_group,
    px_server_set
FROM v$px_session
WHERE px_qcsid = (SELECT sid FROM v$session WHERE audsid = USERENV('SESSIONID'));

7.3 分区消除优化

-- 分区消除示例
SELECT 
    product_key,
    SUM(total_amount) as monthly_sales
FROM fact_sales
WHERE time_key BETWEEN 20230601 AND 20230630  -- 只访问6月分区
GROUP BY product_key;

-- 动态分区消除
SELECT 
    p.category,
    SUM(fs.total_amount) as total_sales
FROM fact_sales fs
JOIN dim_product p ON fs.product_key = p.product_key
JOIN dim_time t ON fs.time_key = t.time_key
WHERE t.year_number = :year_param  -- 根据参数动态消除分区
  AND t.quarter_number = :quarter_param
GROUP BY p.category;

-- 查看分区消除效果
SELECT 
    sql_id,
    plan_hash_value,
    executions,
    px_servers_executions,
    elapsed_time,
    cpu_time
FROM v$sql
WHERE sql_text LIKE '%fact_sales%'
ORDER BY elapsed_time DESC;

8. 数据压缩

8.1 表压缩策略

Oracle提供多种压缩技术来节省存储空间:

-- 基本表压缩
CREATE TABLE fact_sales_compressed (
    time_key NUMBER,
    product_key NUMBER,
    customer_key NUMBER,
    total_amount NUMBER
) COMPRESS FOR OLTP;

-- 高级压缩(需要Advanced Compression选项)
CREATE TABLE fact_sales_advanced_compressed (
    time_key NUMBER,
    product_key NUMBER,
    customer_key NUMBER,
    total_amount NUMBER
) COMPRESS FOR QUERY LOW;

-- 混合列压缩(Exadata环境)
CREATE TABLE fact_sales_hybrid_compressed (
    time_key NUMBER,
    product_key NUMBER,
    customer_key NUMBER,
    total_amount NUMBER
) COMPRESS FOR QUERY HIGH;

-- 为现有表启用压缩
ALTER TABLE fact_sales COMPRESS FOR QUERY LOW;

-- 压缩特定分区
ALTER TABLE fact_sales 
MODIFY PARTITION p2023q1 COMPRESS FOR QUERY HIGH;

8.2 索引压缩

-- B树索引压缩
CREATE INDEX idx_fact_sales_compressed 
ON fact_sales (time_key, product_key) COMPRESS 2;

-- 重建现有索引并压缩
ALTER INDEX idx_fact_sales_time REBUILD COMPRESS;

-- 查看压缩效果
SELECT 
    table_name,
    compression,
    compress_for,
    num_rows,
    blocks,
    avg_row_len
FROM user_tables
WHERE table_name LIKE 'FACT_SALES%';

8.3 压缩效果监控

-- 创建压缩监控脚本
CREATE OR REPLACE PROCEDURE monitor_compression AS
    v_uncompressed_size NUMBER;
    v_compressed_size NUMBER;
    v_compression_ratio NUMBER;
BEGIN
    -- 获取压缩前大小
    SELECT SUM(bytes)/1024/1024 INTO v_uncompressed_size
    FROM user_segments
    WHERE segment_name = 'FACT_SALES_UNCOMPRESSED';
    
    -- 获取压缩后大小
    SELECT SUM(bytes)/1024/1024 INTO v_compressed_size
    FROM user_segments
    WHERE segment_name = 'FACT_SALES_COMPRESSED';
    
    -- 计算压缩比
    v_compression_ratio := v_uncompressed_size / v_compressed_size;
    
    -- 记录结果
    INSERT INTO compression_log VALUES (
        SYSDATE, 'FACT_SALES', v_uncompressed_size, 
        v_compressed_size, v_compression_ratio
    );
    
    COMMIT;
END;
/

9. 实时数据仓库

9.1 实时ETL架构

现代数据仓库需要支持实时或近实时的数据处理:

实时数据源
消息队列
流处理引擎
实时数据仓库
实时分析
在线交易系统
物联网设备
日志文件
API接口
Oracle Streams
Kafka
OGG
Oracle GoldenGate
Stream Analytics
Spark Streaming
实时事实表
实时汇总表
缓存层
实时仪表板
告警系统
实时报表

9.2 Oracle GoldenGate集成

-- 配置GoldenGate复制
-- 在源端创建补充日志
ALTER DATABASE ADD SUPPLEMENTAL LOG DATA;
ALTER TABLE orders ADD SUPPLEMENTAL LOG DATA (ALL) COLUMNS;

-- 创建GoldenGate用户
CREATE USER ggs_user IDENTIFIED BY ggs_password;
GRANT DBA TO ggs_user;
GRANT EXECUTE ON DBMS_FLASHBACK TO ggs_user;

-- 在目标端创建实时加载表
CREATE TABLE real_time_sales (
    sale_id NUMBER,
    customer_id NUMBER,
    product_id NUMBER,
    sale_amount NUMBER,
    sale_timestamp TIMESTAMP,
    operation_type VARCHAR2(10),
    processed_flag CHAR(1) DEFAULT 'N'
);

-- 创建实时处理存储过程
CREATE OR REPLACE PROCEDURE process_real_time_sales AS
    CURSOR c_new_sales IS
        SELECT * FROM real_time_sales 
        WHERE processed_flag = 'N'
        ORDER BY sale_timestamp;
BEGIN
    FOR sale_rec IN c_new_sales LOOP
        -- 更新实时汇总表
        MERGE INTO real_time_sales_summary rss
        USING (SELECT 
                 TO_CHAR(sale_rec.sale_timestamp, 'YYYYMMDDHH24') as hour_key,
                 sale_rec.product_id,
                 sale_rec.sale_amount
               FROM DUAL) src
        ON (rss.hour_key = src.hour_key AND rss.product_id = src.product_id)
        WHEN MATCHED THEN
            UPDATE SET 
                total_amount = total_amount + src.sale_amount,
                transaction_count = transaction_count + 1,
                last_updated = SYSTIMESTAMP
        WHEN NOT MATCHED THEN
            INSERT (hour_key, product_id, total_amount, transaction_count, last_updated)
            VALUES (src.hour_key, src.product_id, src.sale_amount, 1, SYSTIMESTAMP);
        
        -- 标记为已处理
        UPDATE real_time_sales 
        SET processed_flag = 'Y'
        WHERE sale_id = sale_rec.sale_id;
        
        -- 每1000条提交一次
        IF MOD(c_new_sales%ROWCOUNT, 1000) = 0 THEN
            COMMIT;
        END IF;
    END LOOP;
    
    COMMIT;
END;
/

9.3 Lambda架构实现

-- 批处理层(传统数据仓库)
CREATE TABLE batch_sales_summary (
    date_key NUMBER,
    product_key NUMBER,
    total_amount NUMBER,
    transaction_count NUMBER,
    batch_timestamp TIMESTAMP
);

-- 实时处理层(速度层)
CREATE TABLE speed_sales_summary (
    hour_key NUMBER,
    product_key NUMBER,
    total_amount NUMBER,
    transaction_count NUMBER,
    real_time_timestamp TIMESTAMP
);

-- 服务层(合并批处理和实时结果)
CREATE OR REPLACE VIEW unified_sales_summary AS
SELECT 
    TRUNC(hour_key/100) as date_key,
    product_key,
    SUM(total_amount) as total_amount,
    SUM(transaction_count) as transaction_count,
    'REAL_TIME' as data_source
FROM speed_sales_summary
WHERE hour_key > (SELECT MAX(date_key) * 100 FROM batch_sales_summary)
GROUP BY TRUNC(hour_key/100), product_key
UNION ALL
SELECT 
    date_key,
    product_key,
    total_amount,
    transaction_count,
    'BATCH' as data_source
FROM batch_sales_summary;

10. 数据仓库治理

10.1 数据质量管理

数据质量是数据仓库成功的关键因素:

-- 创建数据质量规则表
CREATE TABLE data_quality_rules (
    rule_id NUMBER PRIMARY KEY,
    table_name VARCHAR2(100),
    column_name VARCHAR2(100),
    rule_type VARCHAR2(50), -- NOT_NULL, RANGE, FORMAT, REFERENCE
    rule_expression VARCHAR2(1000),
    rule_description VARCHAR2(500),
    is_active CHAR(1) DEFAULT 'Y',
    created_date DATE DEFAULT SYSDATE
);

-- 插入数据质量规则
INSERT INTO data_quality_rules VALUES 
(1, 'FACT_SALES', 'TOTAL_AMOUNT', 'RANGE', 'total_amount > 0', '销售金额必须大于0', 'Y', SYSDATE);

INSERT INTO data_quality_rules VALUES 
(2, 'DIM_CUSTOMER', 'EMAIL', 'FORMAT', 'email LIKE ''%@%.%''', '邮箱格式验证', 'Y', SYSDATE);

INSERT INTO data_quality_rules VALUES 
(3, 'FACT_SALES', 'CUSTOMER_KEY', 'REFERENCE', 'EXISTS (SELECT 1 FROM dim_customer WHERE customer_key = ?)', '客户外键验证', 'Y', SYSDATE);

-- 数据质量检查存储过程
CREATE OR REPLACE PROCEDURE check_data_quality(p_table_name VARCHAR2) AS
    v_sql VARCHAR2(4000);
    v_error_count NUMBER;
    v_total_count NUMBER;
BEGIN
    -- 获取表的总记录数
    v_sql := 'SELECT COUNT(*) FROM ' || p_table_name;
    EXECUTE IMMEDIATE v_sql INTO v_total_count;
    
    -- 检查每个质量规则
    FOR rule_rec IN (
        SELECT * FROM data_quality_rules 
        WHERE table_name = p_table_name AND is_active = 'Y'
    ) LOOP
        IF rule_rec.rule_type = 'NOT_NULL' THEN
            v_sql := 'SELECT COUNT(*) FROM ' || p_table_name || 
                     ' WHERE ' || rule_rec.column_name || ' IS NULL';
        ELSIF rule_rec.rule_type = 'RANGE' THEN
            v_sql := 'SELECT COUNT(*) FROM ' || p_table_name || 
                     ' WHERE NOT (' || rule_rec.rule_expression || ')';
        ELSIF rule_rec.rule_type = 'FORMAT' THEN
            v_sql := 'SELECT COUNT(*) FROM ' || p_table_name || 
                     ' WHERE NOT (' || rule_rec.rule_expression || ')';
        END IF;
        
        EXECUTE IMMEDIATE v_sql INTO v_error_count;
        
        -- 记录检查结果
        INSERT INTO data_quality_log VALUES (
            quality_log_seq.NEXTVAL,
            SYSDATE,
            p_table_name,
            rule_rec.rule_id,
            v_total_count,
            v_error_count,
            ROUND((v_total_count - v_error_count) / v_total_count * 100, 2)
        );
    END LOOP;
    
    COMMIT;
END;
/

10.2 元数据管理

-- 创建元数据管理表
CREATE TABLE metadata_tables (
    table_id NUMBER PRIMARY KEY,
    table_name VARCHAR2(100) NOT NULL,
    table_type VARCHAR2(20), -- FACT, DIMENSION, STAGING
    business_description VARCHAR2(1000),
    data_source VARCHAR2(200),
    refresh_frequency VARCHAR2(100),
    data_owner VARCHAR2(100),
    created_date DATE DEFAULT SYSDATE
);

CREATE TABLE metadata_columns (
    column_id NUMBER PRIMARY KEY,
    table_id NUMBER,
    column_name VARCHAR2(100),
    data_type VARCHAR2(50),
    is_nullable CHAR(1),
    business_description VARCHAR2(500),
    data_format VARCHAR2(100),
    FOREIGN KEY (table_id) REFERENCES metadata_tables(table_id)
);

-- 插入元数据信息
INSERT INTO metadata_tables VALUES (
    1, 'FACT_SALES', 'FACT', 
    '销售事实表,记录每笔销售交易的详细信息',
    'CRM系统、POS系统', '每日凌晨2点', 
    '销售部门', SYSDATE
);

INSERT INTO metadata_columns VALUES (
    1, 1, 'TIME_KEY', 'NUMBER', 'N',
    '时间维度键,格式YYYYMMDD', 'YYYYMMDD'
);

-- 自动收集表统计元数据
CREATE OR REPLACE PROCEDURE collect_table_metadata AS
BEGIN
    -- 收集表级元数据
    INSERT INTO table_statistics_log (
        log_date, table_name, num_rows, num_blocks, avg_row_len
    )
    SELECT 
        SYSDATE, table_name, num_rows, blocks, avg_row_len
    FROM user_tables
    WHERE table_name LIKE 'FACT_%' OR table_name LIKE 'DIM_%';
    
    -- 收集列级统计
    INSERT INTO column_statistics_log (
        log_date, table_name, column_name, num_distinct, density
    )
    SELECT 
        SYSDATE, table_name, column_name, num_distinct, density
    FROM user_tab_columns
    WHERE table_name LIKE 'FACT_%' OR table_name LIKE 'DIM_%';
    
    COMMIT;
END;
/

10.3 数据血缘追踪

-- 创建数据血缘跟踪表
CREATE TABLE data_lineage (
    lineage_id NUMBER PRIMARY KEY,
    source_table VARCHAR2(100),
    source_column VARCHAR2(100),
    target_table VARCHAR2(100),
    target_column VARCHAR2(100),
    transformation_rule VARCHAR2(1000),
    etl_job_name VARCHAR2(200),
    created_date DATE DEFAULT SYSDATE
);

-- 记录数据血缘关系
INSERT INTO data_lineage VALUES (
    lineage_seq.NEXTVAL,
    'ORDERS', 'ORDER_AMOUNT',
    'FACT_SALES', 'TOTAL_AMOUNT',
    'SUM(order_amount) GROUP BY date, product',
    'DAILY_SALES_ETL',
    SYSDATE
);

-- 查询数据血缘的视图
CREATE OR REPLACE VIEW v_data_lineage AS
SELECT 
    dl.source_table || '.' || dl.source_column as source,
    dl.target_table || '.' || dl.target_column as target,
    dl.transformation_rule,
    dl.etl_job_name,
    mt_source.business_description as source_description,
    mt_target.business_description as target_description
FROM data_lineage dl
LEFT JOIN metadata_tables mt_source ON dl.source_table = mt_source.table_name
LEFT JOIN metadata_tables mt_target ON dl.target_table = mt_target.table_name;

-- 影响分析:查找某个表的所有下游依赖
CREATE OR REPLACE FUNCTION get_downstream_tables(p_table_name VARCHAR2)
RETURN SYS_REFCURSOR AS
    v_cursor SYS_REFCURSOR;
BEGIN
    OPEN v_cursor FOR
        WITH downstream_tables AS (
            SELECT target_table, 1 as level_num
            FROM data_lineage
            WHERE source_table = p_table_name
            UNION ALL
            SELECT dl.target_table, dt.level_num + 1
            FROM data_lineage dl
            JOIN downstream_tables dt ON dl.source_table = dt.target_table
            WHERE dt.level_num < 10  -- 防止无限递归
        )
        SELECT DISTINCT target_table, MIN(level_num) as min_level
        FROM downstream_tables
        GROUP BY target_table
        ORDER BY min_level, target_table;
    
    RETURN v_cursor;
END;
/

这份Oracle数据仓库指南涵盖了从基础概念到高级应用的全方位内容。希望这个完整的指南能帮助你更好地理解和实施Oracle数据仓库项目!

结语
感谢您的阅读!期待您的一键三连!欢迎指正!

在这里插入图片描述

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/2405887.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

基于开源AI大模型AI智能名片S2B2C商城小程序源码的中等平台型社交电商运营模式研究

摘要&#xff1a;本文聚焦中等平台型社交电商&#xff0c;探讨其与传统微商及大型社交电商平台的差异&#xff0c;尤其关注产品品类管理对代理运营的影响。通过引入开源AI大模型、AI智能名片与S2B2C商城小程序源码技术&#xff0c;构建智能化运营体系。研究结果表明&#xff0c…

Vite 双引擎架构 —— Esbuild 概念篇

Vite 底层采用 双引擎架构&#xff0c;核心构建引擎是 Esbuild 和 Rollup&#xff0c;二者在开发和生产环境中分工协作&#xff0c;共同实现高性能构建。不可否认&#xff0c;作为 Vite 的双引擎之一&#xff0c;Esbuild 在很多关键的构建阶段(如依赖预编译、TS 语法转译、代码…

阿里云Alibaba Cloud安装Docker与Docker compose【图文教程】

个人记录 进入控制台&#xff0c;找到定时与自动化任务 进入‘安装/卸载扩展程序’ 点击‘安装扩展程序’ 选择docker社区版&#xff0c;点击下一步与确定&#xff0c;等待一会 安装成功 查询版本 查询docker sudo docker version查询docker compose sudo docker compo…

ADB识别手机系统弹授权框-如何处理多重弹框叠加和重叠问题

ADB识别手机系统弹授权框-如何处理多重弹框叠加和重叠问题 --蓝牙电话SDK自动部署 上一篇&#xff1a;手机App-插入USB时自动授权点击确定按钮-使系统弹出框自动消失 下一篇&#xff1a;编写中。 一、前言 我们在上一篇《手机App-插入USB时自动授权点击确定按钮-使系统弹出框…

uniapp+<script setup lang=“ts“>解决有数据与暂无数据切换显示,有数据加载时暂无数据闪现(先加载空数据)问题

声明showEmpty 为false&#xff0c;在接口返回处判断有数据时设置showEmpty 为false&#xff0c;接口返回数据为空则判断showEmpty 为true &#xff08;这样就解决有数据的时候会闪现暂无数据的问题啦&#xff09; <!--* Date: 2024-02-26 03:38:52* LastEditTime: 2025-06…

详解鸿蒙Next仓颉开发语言中的动画

大家上午好&#xff0c;今天来聊一聊仓颉开发语言中的动画开发。 仓颉中的动画通常有两种方式&#xff0c;分别是属性动画和显示动画&#xff0c;我们今天以下面的加载动画为例&#xff0c;使用显示动画和属性动画分别实现一下&#xff0c;看看他们有什么区别。 显示动画 显示…

Redis常见使用场景解析

1. 数据库缓存 Redis 作为典型的 Key-Value 型内存数据库,数据缓存是其最广为人知的应用场景。使用 Redis 缓存数据操作简便,通常将序列化后的对象以 string 类型存储。但在实际应用中,需注意以下关键要点: Key 设计:必须确保不同对象的 Key 具有唯一性,且尽量缩短长度,…

起重机指挥人员在工作中需要注意哪些安全事项?

起重机指挥人员在作业中承担着协调设备运行、保障作业安全的关键职责&#xff0c;其安全操作直接关系到整个起重作业的安全性。以下从作业前、作业中、作业后的全流程&#xff0c;详细说明指挥人员需注意的安全事项&#xff1a; 一、作业前的安全准备 资质与状态检查&#xff…

JAVA-springboot log日志

SpringBoot从入门到精通-第8章 日志的操作 一、Spring Boot默认的日志框架 SpringBoot支持很多种日志框架&#xff0c;通常情况下&#xff0c;这些日志框架都是由一个日志抽象层和一个日志实现层搭建而成的&#xff0c;日志抽象层是为记录日志提供的一套标准且规范的框架&…

1.springmvc基础入门(一)

1.Spring MVC概念 Spring MVC 是 Spring Framework 提供的 Web 组件&#xff0c;全称是 Spring Web MVC&#xff0c;是⽬前主流的实现 MVC 设计模式的框架&#xff0c;提供前端路由映射、视图解析等功能。 Java Web 开发者必须要掌握的技术框架。 2.Spring MVC 功能 MVC&am…

模块缝合-把A模块换成B模块(没写完)

把MLP Head替换为KAN 1.在model文件下新建一个python文件 2.把 模块文件里的整个KAN代码复制到新的python文件中 3.在开头导入 from model.KAN(新建文件名&#xff09; import KAN&#xff08;新建文件中的类名&#xff09; 4.sys.path.append(r"D: Icode(Kansformer"…

从零开始学Flink:揭开实时计算的神秘面纱

一、为什么需要Flink&#xff1f; 当你在电商平台秒杀商品时&#xff0c;1毫秒的延迟可能导致交易失败&#xff1b;当自动驾驶汽车遇到障碍物时&#xff0c;10毫秒的计算延迟可能酿成事故。这些场景揭示了一个残酷事实&#xff1a;数据的价值随时间呈指数级衰减。 传统批处理…

Appium如何支持ios真机测试

ios模拟器上UI自动化测试 以appiumwebdriverio为例&#xff0c;详细介绍如何在模拟器上安装和测试app。在使用ios模拟器前&#xff0c;需要安装xcode&#xff0c;创建和启动一个simulator。simulator创建好后&#xff0c;就可以使用xcrun simctl命令安装被测应用并开始测试了。…

JDK17 Http Request 异步处理 源码刨析

为什么可以异步&#xff1f; #调用起始源码 // 3. 发送异步请求并处理响应 CompletableFuture future client.sendAsync( request, HttpResponse.BodyHandlers.ofString() // 响应体转为字符串 ).thenApply(response -> { // 状态码检查&#xff08;非200系列抛出异常&…

【Zephyr 系列 8】构建完整 BLE 产品架构:状态机 + AT 命令 + 双通道通信实战

🧠关键词:Zephyr、BLE、状态机、双向透传、AT 命令、Buffer、主从共存、系统架构 📌适合人群:希望开发 BLE 产品(模块/标签/终端)具备可控、可测、可维护架构的开发者 🧭 引言:从“点功能”到“系统架构” 前面几篇我们已经逐步构建了 BLE 广播、连接、数据透传系统…

【Mac 从 0 到 1 保姆级配置教程 16】- Docker 快速安装配置、常用命令以及实际项目演示

文章目录 前言1. Docker 是什么&#xff1f;2. 为什么要使用 Docker&#xff1f; 安装 Docker1. 安装 Docker Desktop2. 安装 OrbStack3. Docker Desktop VS OrbStack5. 验证安装 使用 Docker 运行项目1. 克隆项目到本地2. 进入项目目录3. 启动容器: 查看运行效果1. OrbStack 中…

2025-05-01-决策树算法及应用

决策树算法及应用 参考资料 GitHub - zhaoyichanghong/machine_learing_algo_python: implement the machine learning algorithms by p(机器学习相关的 github 仓库)决策树实现与应用决策树 概述 机器学习算法分类 决策树算法 决策树是一种以树状结构对数据进行划分的分类…

Redis知识体系

1. 概述 本文总结了Redis基本的核心知识体系&#xff0c;在学习Redis的过程中&#xff0c;可以将其作为学习框架&#xff0c;以此更好的从整体的角度去理解和学习Redis的内容和设计思想。同时知识框架带来的好处是可以帮助我们更好的进行记忆&#xff0c;在大脑中形成相应的知识…

mysql-MySQL体系结构和存储引擎

1. MySQL体系结构和存储引擎 MySQL被设计成一个单进程多线程架构的数据库&#xff0c;MySQL数据库实例在系统上的表现就是一个进 程当启动实例时&#xff0c;读取配置文件&#xff0c;根据配置文件的参数来启动数据库实例&#xff1b;若没有&#xff0c;按编译时的默认 参数设…

黑马Javaweb Request和Response

一.介绍 在 Web 开发中&#xff0c;HttpServletRequest 和 HttpServletResponse 是两个非常重要的类&#xff0c;它们分别用于处理客户端的请求和服务器的响应。以下是它们的详细说明和使用方法&#xff1a; 1. HttpServletRequest HttpServletRequest 是一个接口&#xff0…