hiveSql 京东面试题-有效值问题
- 需求
- 准备数据
- 分析
- 实现
- 最后
 
需求
有入库成本表,一个商品每次入库就会产生一条数据,里面包含商品id,入库时间time,以及入库采购的成本。但由于某些某些原因,导致表中某些数据的成本是有丢失的。
 现在的逻辑是:当成本丢失时,有两种取成本的方式,现在需要把两种成本都取出来,最后取2次成本的平均值作为本次入库的成本。取数逻辑如下:
- 1、取同一个商品最近一次之前入库的有效成本,即丢失成本商品的丢失成本当前数据的前一条有效成本数据
- 2、取同一个商品最近一次之后入库的有效成本,即丢失成本商品的丢失成本当前数据的后一条有效成本数据
- 3、上述中结果依然有无效值时,记为0
具体数据如下:
 
 可见截图中商品id为2的商品在2022-12-02号和2022-12-03号的入库成本丢失,按照上述取数逻辑,会生成两个新的字段last_cost、next_cost。其中
 last_cost是当前丢失成本数据的前一条有效成本数据;
 next_cost是当前丢失成本数据的后一条有效成本数据。
还是看商品id为2的数据,在2022-12-02号这条丢失成本数据中:
 它的last_cost是商品id同样是2,且它的上一条有效成本数据,即2022-12-01的150,
 它的next_cost是商品id同样是2,且它的下一条有效成本数据,即2022-12-04的200。
 即上截图中第一条填充色为红色的数据行。
同理id为2的2022-12-03号数据也是它的上一行有效成本 和 它的下一条有效成本。
 最后一条商品id为4的2022-12-05号丢失成本数据中next_cost为0,因为它没有下一条有效成本。(即上述逻辑3)
准备数据
select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price

分析
上述需求中可以看出,其实想要补充丢失成本行的数据,只要拿到相对当前丢失成本数据的前、后同商品的最近有效成本,不论有多少条连续的丢失成本数据行,见下图:
 
 只要做到将丢失成本数据行与它的前、后有效成本利用重分组思想将他们分组在一组中,取组内max值即可。
实现
一、分组
with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
    sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
from tmp
先利用重分组思想,根据price值是否为null为界限,顺序,逆序sum开窗,即可将丢失成本数据与它相对应的前、后有效成本分到同一组中。
last_cost分组,可见丢失成本的数据行已经和它的前一行有效成本行分在一组
 
next_cost分组,可见丢失成本的数据行已经和它的后一行有效成本行分在一组
 二、组内取最大price
 按照商品id和last_index、next_index分组,取组内最大的price,其中nullprice赋0值。
with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
    max(if(price is null,0,price)) over(partition by id,next_index) as next_price
from
    (select 
        id,itime,price,
        sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
        sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
    from tmp
    ) t;

 三、取平均值作为最后的成本
 取每条数据last_price和next_price的平均值作为最后的成本数据
with tmp as (
    select '1' as id, '2022-12-01' as itime, 120 as price
    union all 
    select '2' as id, '2022-12-01' as itime, 150 as price
    union all 
    select '2' as id, '2022-12-02' as itime, null as price
    union all 
    select '2' as id, '2022-12-03' as itime, null as price
    union all 
    select '2' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '2' as id, '2022-12-05' as itime, 210 as price
    union all 
    select '3' as id, '2022-12-06' as itime, 300 as price
    union all 
    select '3' as id, '2022-12-07' as itime, null as price
    union all 
    select '3' as id, '2022-12-08' as itime, 400 as price
    union all 
    select '4' as id, '2022-12-01' as itime, 140 as price
    union all 
    select '4' as id, '2022-12-02' as itime, null as price
    union all 
    select '4' as id, '2022-12-03' as itime, null as price
    union all 
    select '4' as id, '2022-12-04' as itime, 200 as price
    union all 
    select '4' as id, '2022-12-05' as itime, null as price
)
select 
    id,itime,price,
    case when price is null then (last_price + next_price) / 2 else price end as last_price
from
    (select 
        id,itime,price,
        max(if(price is null,0,price)) over(partition by id,last_index) as last_price,
        max(if(price is null,0,price)) over(partition by id,next_index) as next_price
    from
        (select 
            id,itime,price,
            sum(if(price is null, 0, 1)) over(partition by id order by itime) as last_index,
            sum(if(price is null, 0, 1)) over(partition by id order by itime desc) as next_index
        from tmp
        ) t
    ) t1;

最后
喜欢的点赞、关注、收藏吧~ 你的支持是最大的创作动力~~


















