个人定制化形象生成,FaceChain最新模型部署

news2025/6/10 0:44:23

FaceChain是阿里巴巴达摩院推出的一个开源的人物写真和个人数字形象的AI生成框架。

FaceChain利用了Stable Diffusion模型的文生图功能,并结合人像风格化LoRA模型训练及人脸相关感知理解模型,将输入的图片进行训练后推理输出生成个人写真图像。

FaceChain在最新版本中引入了Face Adapter with deCoupled Training (FACT),能够在10秒内生成个人肖像。

FaceChain能够生成多种风格的个人写真,包括汉服风、工作照、校服风、圣诞风、绅士风、漫画风等,满足用户多样化的个性化需求。

FaceChain还支持超分辨率,提供多种分辨率选择(512x512, 768x768, 1024x1024, 2048x2048),并集成了多个高质量的风格LoRA模型,使得生成的肖像具有广泛的风格变化。

github项目地址:https://github.com/modelscope/facechain。

一、环境安装

1、python环境

建议安装python版本在3.10以上。

2、pip库安装

pip install torch==2.0.1+cu118 torchvision==0.15.2+cu118 torchaudio==2.0.2 --index-url https://download.pytorch.org/whl/cu118

pip install openmim gradio controlnet_aux python-slugify diffusers peft onnxruntime-gpu fsspec edge_tts opencv-python mediapipe datasets==2.18.0 modelscope==1.15.0 -i https://pypi.tuna.tsinghua.edu.cn/simple

mim install mmengine

mim install mmcv-full==1.7.2

mim install mmdet==2.26.0

3、相关模型下载

(1)人脸检测以及关键点模型

git lfs install

git clone https://modelscope.cn/models/damo/cv_ddsar_face-detection_iclr23-damofd

(2)人体解析模型

git lfs install

git clone https://modelscope.cn/models/damo/cv_resnet101_image-multiple-human-parsing

(3)人像美肤模型

git lfs install

git clone https://www.modelscope.cn/models/damo/cv_unet_skin_retouching_torch

(4)人脸融合模型

git lfs install

git clone https://www.modelscope.cn/models/damo/cv_unet_face_fusion_torch

(5)FaceChain FACT模型

git lfs install

git clone https://www.modelscope.cn/models/yucheng1996/FaceChain-FACT

(6)人脸属性识别模型

git lfs install

git clone https://modelscope.cn/models/damo/cv_resnet34_face-attribute-recognition_fairface

、功能测试

1、命令行运行测试

(1)python代码调用测试

import os
import time
import cv2
import json
import numpy as np
import torch

from controlnet_aux import OpenposeDetector
from diffusers import (ControlNetModel, DPMSolverMultistepScheduler,
                       DPMSolverSinglestepScheduler,
                       StableDiffusionControlNetInpaintPipeline,
                       StableDiffusionControlNetPipeline,
                       StableDiffusionImg2ImgPipeline, StableDiffusionPipeline,
                       PNDMScheduler,
                       UniPCMultistepScheduler)
from facechain.merge_lora import merge_lora

from PIL import Image
from skimage import transform
from transformers import pipeline as tpipeline

from modelscope import snapshot_download
from modelscope.outputs import OutputKeys
from modelscope.pipelines import pipeline
from modelscope.utils.constant import Tasks

from face_adapter import FaceAdapter_v1, Face_Extracter_v1

def concatenate_images(images):
    heights = [img.shape[0] for img in images]
    max_width = sum([img.shape[1] for img in images])

    concatenated_image = np.zeros((max(heights), max_width, 3), dtype=np.uint8)
    x_offset = 0
    for img in images:
        concatenated_image[0:img.shape[0],
                           x_offset:x_offset + img.shape[1], :] = img
        x_offset += img.shape[1]
    return concatenated_image

def call_face_crop(det_pipeline, image, crop_ratio):
    det_result = det_pipeline(image)
    bboxes = det_result['boxes']
    keypoints = det_result['keypoints']
    area = 0
    idx = 0
    for i in range(len(bboxes)):
        bbox = bboxes[i]
        area_tmp = (float(bbox[2]) - float(bbox[0])) * (
            float(bbox[3]) - float(bbox[1]))
        if area_tmp > area:
            area = area_tmp
            idx = i
    bbox = bboxes[idx]
    keypoint = keypoints[idx]
    points_array = np.zeros((5, 2))
    for k in range(5):
        points_array[k, 0] = keypoint[2 * k]
        points_array[k, 1] = keypoint[2 * k + 1]
    w, h = image.size
    face_w = bbox[2] - bbox[0]
    face_h = bbox[3] - bbox[1]
    bbox[0] = np.clip(
        np.array(bbox[0], np.int32) - face_w * (crop_ratio - 1) / 2, 0, w - 1)
    bbox[1] = np.clip(
        np.array(bbox[1], np.int32) - face_h * (crop_ratio - 1) / 2, 0, h - 1)
    bbox[2] = np.clip(
        np.array(bbox[2], np.int32) + face_w * (crop_ratio - 1) / 2, 0, w - 1)
    bbox[3] = np.clip(
        np.array(bbox[3], np.int32) + face_h * (crop_ratio - 1) / 2, 0, h - 1)
    bbox = np.array(bbox, np.int32)
    return bbox, points_array

def crop_and_paste(Source_image,
                   Source_image_mask,
                   Target_image,
                   Source_Five_Point,
                   Target_Five_Point,
                   Source_box,
                   use_warp=True):
    if use_warp:
        Source_Five_Point = np.reshape(Source_Five_Point, [5, 2]) - np.array(
            Source_box[:2])
        Target_Five_Point = np.reshape(Target_Five_Point, [5, 2])

        Crop_Source_image = Source_image.crop(np.int32(Source_box))
        Crop_Source_image_mask = Source_image_mask.crop(np.int32(Source_box))
        Source_Five_Point, Target_Five_Point = np.array(
            Source_Five_Point), np.array(Target_Five_Point)

        tform = transform.SimilarityTransform()
        tform.estimate(Source_Five_Point, Target_Five_Point)
        M = tform.params[0:2, :]

        warped = cv2.warpAffine(
            np.array(Crop_Source_image),
            M,
            np.shape(Target_image)[:2][::-1],
            borderValue=0.0)
        warped_mask = cv2.warpAffine(
            np.array(Crop_Source_image_mask),
            M,
            np.shape(Target_image)[:2][::-1],
            borderValue=0.0)

        mask = np.float16(warped_mask == 0)
        output = mask * np.float16(Target_image) + (
            1 - mask) * np.float16(warped)
    else:
        mask = np.float16(np.array(Source_image_mask) == 0)
        output = mask * np.float16(Target_image) + (
            1 - mask) * np.float16(Source_image)
    return output, mask

def segment(segmentation_pipeline,
            img,
            ksize=0,
            eyeh=0,
            ksize1=0,
            include_neck=False,
            warp_mask=None,
            return_human=False):
    if True:
        result = segmentation_pipeline(img)
        masks = result['masks']
        scores = result['scores']
        labels = result['labels']
        if len(masks) == 0:
            return
        h, w = masks[0].shape
        mask_face = np.zeros((h, w))
        mask_hair = np.zeros((h, w))
        mask_neck = np.zeros((h, w))
        mask_human = np.zeros((h, w))
        for i in range(len(labels)):
            if scores[i] > 0.8:
                if labels[i] == 'Torso-skin':
                    if np.sum(masks[i]) > np.sum(mask_neck):
                        mask_neck = masks[i]
                elif labels[i] == 'Face':
                    if np.sum(masks[i]) > np.sum(mask_face):
                        mask_face = masks[i]
                elif labels[i] == 'Human':
                    if np.sum(masks[i]) > np.sum(mask_human):
                        mask_human = masks[i]
                elif labels[i] == 'Hair':
                    if np.sum(masks[i]) > np.sum(mask_hair):
                        mask_hair = masks[i]
        mask_face = np.clip(mask_face, 0, 1)
        mask_hair = np.clip(mask_hair, 0, 1)
        mask_neck = np.clip(mask_neck, 0, 1)
        mask_human = np.clip(mask_human, 0, 1)
        if np.sum(mask_face) > 0:
            soft_mask = np.clip(mask_face, 0, 1)
            if ksize1 > 0:
                kernel_size1 = int(np.sqrt(np.sum(soft_mask)) * ksize1)
                kernel1 = np.ones((kernel_size1, kernel_size1))
                soft_mask = cv2.dilate(soft_mask, kernel1, iterations=1)
            if ksize > 0:
                kernel_size = int(np.sqrt(np.sum(soft_mask)) * ksize)
                kernel = np.ones((kernel_size, kernel_size))
                soft_mask_dilate = cv2.dilate(soft_mask, kernel, iterations=1)
                if warp_mask is not None:
                    soft_mask_dilate = soft_mask_dilate * (
                        np.clip(soft_mask + warp_mask[:, :, 0], 0, 1))
                if eyeh > 0:
                    soft_mask = np.concatenate(
                        (soft_mask[:eyeh], soft_mask_dilate[eyeh:]), axis=0)
                else:
                    soft_mask = soft_mask_dilate
        else:
            if ksize1 > 0:
                kernel_size1 = int(np.sqrt(np.sum(soft_mask)) * ksize1)
                kernel1 = np.ones((kernel_size1, kernel_size1))
                soft_mask = cv2.dilate(mask_face, kernel1, iterations=1)
            else:
                soft_mask = mask_face
        if include_neck:
            soft_mask = np.clip(soft_mask + mask_neck, 0, 1)

    if return_human:
        mask_human = cv2.GaussianBlur(mask_human, (21, 21), 0) * mask_human
        return soft_mask, mask_human
    else:
        return soft_mask

def crop_bottom(pil_file, width):
    if width == 512:
        height = 768
    else:
        height = 1152
    w, h = pil_file.size
    factor = w / width
    new_h = int(h / factor)
    pil_file = pil_file.resize((width, new_h))
    crop_h = min(int(new_h / 32) * 32, height)
    array_file = np.array(pil_file)
    array_file = array_file[:crop_h, :, :]
    output_file = Image.fromarray(array_file)
    return output_file

def img2img_multicontrol(img,
                         face_image,
                         control_image,
                         controlnet_conditioning_scale,
                         pipe,
                         mask,
                         pos_prompt,
                         neg_prompt,
                         strength,
                         num=1,
                         use_ori=False):
    image_mask = Image.fromarray(np.uint8(mask * 255))
    image_human = []
    for i in range(num):
        image_human.append(
            pipe.generate(
                image=img,
                face_image=face_image,
                mask_image=image_mask,
                control_image=control_image,
                prompt=pos_prompt,
                negative_prompt=neg_prompt,
                guidance_scale=5.0,
                strength=strength,
                num_inference_steps=50,
                controlnet_conditioning_scale=controlnet_conditioning_scale,
                num_images_per_prompt=1)[0])
        if use_ori:
            image_human[i] = Image.fromarray(
                (np.array(image_human[i]) * mask[:, :, None] + np.array(img) *
                 (1 - mask[:, :, None])).astype(np.uint8))
    return image_human

def get_mask(result):
    masks = result['masks']
    scores = result['scores']
    labels = result['labels']
    h, w = masks[0].shape
    mask_hair = np.zeros((h, w))
    mask_face = np.zeros((h, w))
    mask_human = np.zeros((h, w))
    for i in range(len(labels)):
        if scores[i] > 0.8:
            if labels[i] == 'Face':
                if np.sum(masks[i]) > np.sum(mask_face):
                    mask_face = masks[i]
            elif labels[i] == 'Human':
                if np.sum(masks[i]) > np.sum(mask_human):
                    mask_human = masks[i]
            elif labels[i] == 'Hair':
                if np.sum(masks[i]) > np.sum(mask_hair):
                    mask_hair = masks[i]
    mask_rst = np.clip(mask_human - mask_hair - mask_face, 0, 1)
    mask_rst = np.expand_dims(mask_rst, 2)
    mask_rst = np.concatenate([mask_rst, mask_rst, mask_rst], axis=2)
    return mask_rst

def main_diffusion_inference_inpaint(num_gen_images,
                                     inpaint_image,
                                     strength,
                                     output_img_size,
                                     times,
                                     pos_prompt,
                                     neg_prompt,
                                     input_img,
                                     segmentation_pipeline=None,
                                     image_face_fusion=None,
                                     openpose=None,
                                     controlnet=None,
                                     det_pipeline=None,
                                     pipe_pose=None,
                                     pipe_all=None,
                                     face_quality_func=None):

    dtype = torch.float16

    add_prompt_style = ''

    if isinstance(inpaint_image, str):
        inpaint_im = Image.open(inpaint_image)
    else:
        inpaint_im = inpaint_image
    inpaint_im = crop_bottom(inpaint_im, 512)

    st = time.time()
    openpose_image = openpose(
        np.array(inpaint_im, np.uint8), include_hand=True, include_face=False)
    w, h = inpaint_im.size
    et = time.time()
    print('inference_0 time: {:.4f}s'.format(et - st))

    st = time.time()

    image_faces = []
    for i in range(num_gen_images):
        image_face = pipe_pose.generate(
            prompt=add_prompt_style + pos_prompt,
            image=openpose_image,
            face_image=input_img,
            height=h,
            width=w,
            guidance_scale=5.0,
            negative_prompt=neg_prompt,
            num_inference_steps=50,
            num_images_per_prompt=1)[0]
        image_faces.append(image_face)
    et = time.time()
    print('inference_1 time: {:.4f}s'.format(et - st))

    st = time.time()
    selected_face = input_img
    swap_results = face_swap_fn(True, image_faces, selected_face,
                                image_face_fusion, segmentation_pipeline)
    torch.cuda.empty_cache()

    et = time.time()
    print('inference_2 time: {:.4f}s'.format(et - st))

    st = time.time()
    images_human = []
    images_auto = []
    inpaint_bbox, inpaint_keypoints = call_face_crop(det_pipeline, inpaint_im,
                                                     1.1)
    eye_height = int((inpaint_keypoints[0, 1] + inpaint_keypoints[1, 1]) / 2)
    canny_image = cv2.Canny(np.array(inpaint_im, np.uint8), 100, 200)[:, :,
                                                                      None]
    mask = segment(
        segmentation_pipeline, inpaint_im, ksize=0.05, eyeh=eye_height)
    canny_image = (canny_image * (1.0 - mask[:, :, None])).astype(np.uint8)
    canny_image = Image.fromarray(
        np.concatenate([canny_image, canny_image, canny_image], axis=2))
    et = time.time()
    print('inference_4 time: {:.4f}s'.format(et - st))
    st = time.time()
    # canny_image.save('canny.png')
    for i in range(num_gen_images):
        image_face = swap_results[i]
        image_face = Image.fromarray(image_face[:, :, ::-1])

        face_bbox, face_keypoints = call_face_crop(det_pipeline, image_face,
                                                   1.5)
        face_mask = segment(segmentation_pipeline, image_face)
        face_mask = np.expand_dims((face_mask * 255).astype(np.uint8), axis=2)
        face_mask = np.concatenate([face_mask, face_mask, face_mask], axis=2)
        face_mask = Image.fromarray(face_mask)
        replaced_input_image, warp_mask = crop_and_paste(
            image_face, face_mask, inpaint_im, face_keypoints,
            inpaint_keypoints, face_bbox)
        warp_mask = 1.0 - warp_mask

        st = time.time()
        openpose_image = openpose(
            np.array(replaced_input_image * warp_mask, np.uint8),
            include_hand=True,
            include_body=False,
            include_face=True)
        et = time.time()
        print('inference_5 time: {:.4f}s'.format(et - st))
        read_control = [openpose_image, canny_image]
        inpaint_mask, human_mask = segment(
            segmentation_pipeline,
            inpaint_im,
            ksize=0.1,
            ksize1=0.04,
            eyeh=eye_height,
            include_neck=False,
            warp_mask=warp_mask,
            return_human=True)
        inpaint_with_mask = ((1.0 - inpaint_mask[:, :, None])
                             * np.array(inpaint_im))[:, :, ::-1]
        print('Finishing segmenting images.')
        images_human.extend(
            img2img_multicontrol(
                inpaint_im,
                input_img,
                read_control, [1.0, 0.2],
                pipe_all,
                inpaint_mask,
                add_prompt_style + pos_prompt,
                neg_prompt,
                strength=strength))
        images_auto.extend(
            img2img_multicontrol(
                inpaint_im,
                input_img,
                read_control, [1.0, 0.2],
                pipe_all,
                np.zeros_like(inpaint_mask),
                add_prompt_style + pos_prompt,
                neg_prompt,
                strength=0.025))

        edge_add = np.array(inpaint_im).astype(np.int16) - np.array(
            images_auto[i]).astype(np.int16)
        edge_add = edge_add * (1 - human_mask[:, :, None])
        images_human[i] = Image.fromarray((np.clip(
            np.array(images_human[i]).astype(np.int16)
            + edge_add.astype(np.int16), 0, 255)).astype(np.uint8))

    st = time.time()
    images_rst = []
    for i in range(len(images_human)):
        im = images_human[i]
        canny_image = cv2.Canny(np.array(im, np.uint8), 100, 200)[:, :, None]
        canny_image = Image.fromarray(
            np.concatenate([canny_image, canny_image, canny_image], axis=2))
        st = time.time()
        openpose_image = openpose(
            np.array(im, np.uint8),
            include_hand=True,
            include_face=True,
            include_body=False)
        et = time.time()
        print('inference_6 time: {:.4f}s'.format(et - st))
        read_control = [openpose_image, canny_image]
        inpaint_mask, human_mask = segment(
            segmentation_pipeline,
            images_human[i],
            ksize=0.02,
            return_human=True)
        print('Finishing segmenting images.')
        image_rst = img2img_multicontrol(
            im,
            input_img,
            read_control, [0.8, 0.8],
            pipe_all,
            inpaint_mask,
            add_prompt_style + pos_prompt,
            neg_prompt,
            strength=0.1,
            num=1)[0]
        image_auto = img2img_multicontrol(
            im,
            input_img,
            read_control, [0.8, 0.8],
            pipe_all,
            np.zeros_like(inpaint_mask),
            add_prompt_style + pos_prompt,
            neg_prompt,
            strength=0.025,
            num=1)[0]
        edge_add = np.array(im).astype(np.int16) - np.array(image_auto).astype(
            np.int16)
        edge_add = edge_add * (1 - human_mask[:, :, None])
        image_rst = Image.fromarray((np.clip(
            np.array(image_rst).astype(np.int16) + edge_add.astype(np.int16),
            0, 255)).astype(np.uint8))
        images_rst.append(image_rst)

    return images_rst, False

def stylization_fn(use_stylization, rank_results):
    if use_stylization:
        #  TODO
        pass
    else:
        return rank_results

def main_model_inference(num_gen_images,
                         inpaint_image,
                         strength,
                         output_img_size,
                         pos_prompt,
                         neg_prompt,
                         use_main_model,
                         input_img=None,
                         segmentation_pipeline=None,
                         image_face_fusion=None,
                         openpose=None,
                         controlnet=None,
                         det_pipeline=None,
                         pipe_pose=None,
                         pipe_all=None,
                         face_quality_func=None):
    # inpaint_image = compress_image(inpaint_image, 1024 * 1024)
    if use_main_model:
        return main_diffusion_inference_inpaint(
            num_gen_images,
            inpaint_image,
            strength,
            output_img_size,
            1,
            pos_prompt,
            neg_prompt,
            input_img,
            segmentation_pipeline=segmentation_pipeline,
            image_face_fusion=image_face_fusion,
            openpose=openpose,
            controlnet=controlnet,
            det_pipeline=det_pipeline,
            pipe_pose=pipe_pose,
            pipe_all=pipe_all,
            face_quality_func=face_quality_func)

def select_high_quality_face(input_img_dir, face_quality_func):
    input_img_dir = str(input_img_dir) + '_labeled'
    quality_score_list = []
    abs_img_path_list = []
    #  TODO
    for img_name in os.listdir(input_img_dir):
        if img_name.endswith('jsonl') or img_name.startswith(
                '.ipynb') or img_name.startswith('.safetensors'):
            continue

        if img_name.endswith('jpg') or img_name.endswith('png'):
            abs_img_name = os.path.join(input_img_dir, img_name)
            face_quality_score = face_quality_func(abs_img_name)[
                OutputKeys.SCORES]
            if face_quality_score is None:
                quality_score_list.append(0)
            else:
                quality_score_list.append(face_quality_score[0])
            abs_img_path_list.append(abs_img_name)

    sort_idx = np.argsort(quality_score_list)[::-1]
    print('Selected face: ' + abs_img_path_list[sort_idx[0]])

    return Image.open(abs_img_path_list[sort_idx[0]])

def face_swap_fn(use_face_swap, gen_results, template_face, image_face_fusion,
                 segmentation_pipeline):
    if use_face_swap:
        #  TODO
        out_img_list = []
        for img in gen_results:
            result = image_face_fusion(dict(
                template=img, user=template_face))[OutputKeys.OUTPUT_IMG]
            face_mask = segment(segmentation_pipeline, img, ksize=0.1)
            result = (result * face_mask[:, :, None]
                      + np.array(img)[:, :, ::-1] *
                      (1 - face_mask[:, :, None])).astype(np.uint8)
            out_img_list.append(result)
        return out_img_list
    else:
        ret_results = []
        for img in gen_results:
            ret_results.append(cv2.cvtColor(np.array(img), cv2.COLOR_RGB2BGR))
        return ret_results

def post_process_fn(use_post_process, swap_results_ori, selected_face,
                    num_gen_images):
    if use_post_process:
        sim_list = []
        #  TODO
        face_recognition_func = pipeline(
            Tasks.face_recognition,
            'damo/cv_ir_face-recognition-ood_rts',
            model_revision='v2.5')
        face_det_func = pipeline(
            task=Tasks.face_detection,
            model='damo/cv_ddsar_face-detection_iclr23-damofd',
            model_revision='v1.1')
        swap_results = swap_results_ori

        select_face_emb = face_recognition_func(selected_face)[
            OutputKeys.IMG_EMBEDDING][0]

        for img in swap_results:
            emb = face_recognition_func(img)[OutputKeys.IMG_EMBEDDING]
            if emb is None or select_face_emb is None:
                sim_list.append(0)
            else:
                sim = np.dot(emb, select_face_emb)
                sim_list.append(sim.item())
        sort_idx = np.argsort(sim_list)[::-1]

        return np.array(swap_results)[
            sort_idx[:min(int(num_gen_images), len(swap_results))]]
    else:
        return np.array(swap_results_ori)

def process_inpaint_img(inpaint_img, resize_size=(1024, 1024)):
    if isinstance(inpaint_img, str):
        #inpaint_img = compress_image(inpaint_img, 1024 * 1024)
        inpaint_img = Image.open(inpaint_img)
    else:
        inpaint_img = Image.fromarray(inpaint_img[:, :, ::-1])
    ori_img = np.array(inpaint_img)

    h, w, _ = ori_img.shape
    ns = max(h, w)
    ori_img_square = cv2.copyMakeBorder(ori_img, int((ns - h) / 2),
                                        (ns - h) - int((ns - h) / 2),
                                        int((ns - w) / 2), (ns - w) - int(
                                            (ns - w) / 2), cv2.BORDER_DEFAULT)
    ori_img_square_resized = cv2.resize(ori_img_square, resize_size)
    return Image.fromarray(ori_img_square_resized)

def postprocess_inpaint_img(img2img_res, output_size=(768, 1024)):
    resized = cv2.resize(
        np.array(img2img_res), (output_size[1], output_size[1]))
    croped = resized[:, (output_size[1] - output_size[0])
                     // 2:(output_size[1] - output_size[0]) // 2
                     + output_size[0], :]
    return Image.fromarray(croped)

class GenPortrait_inpaint:

    def __init__(self):
        cfg_face = True
        
        fact_model_path = snapshot_download('yucheng1996/FaceChain-FACT', revision='v1.0.0')
        adapter_path = os.path.join(fact_model_path, 'adapter_maj_mask_large_new_reg001_faceshuffle_00290001.ckpt')

        self.segmentation_pipeline = pipeline(
            Tasks.image_segmentation,
            'damo/cv_resnet101_image-multiple-human-parsing',
            model_revision='v1.0.1')
        self.image_face_fusion = pipeline('face_fusion_torch',
                                     model='damo/cv_unet_face_fusion_torch', model_revision='v1.0.3')

        model_dir = snapshot_download(
            'damo/face_chain_control_model', revision='v1.0.1')
        self.openpose = OpenposeDetector.from_pretrained(
            os.path.join(model_dir, 'model_controlnet/ControlNet')).to('cuda')
        self.depth_estimator = tpipeline(
            'depth-estimation',
            os.path.join(model_dir, 'model_controlnet/dpt-large'))

        self.face_quality_func = pipeline(
            Tasks.face_quality_assessment,
            'damo/cv_manual_face-quality-assessment_fqa',
            model_revision='v2.0')
        self.face_detection = pipeline(
            task=Tasks.face_detection,
            model='damo/cv_ddsar_face-detection_iclr23-damofd',
            model_revision='v1.1')

        dtype = torch.float16
        model_dir1 = snapshot_download(
            'ly261666/cv_wanx_style_model', revision='v1.0.3')
        self.controlnet = [
            ControlNetModel.from_pretrained(
                os.path.join(model_dir,
                             'model_controlnet/control_v11p_sd15_openpose'),
                torch_dtype=dtype),
            ControlNetModel.from_pretrained(
                os.path.join(model_dir1, 'contronet-canny'), torch_dtype=dtype)
        ]

        model_dir = snapshot_download(
            'ly261666/cv_wanx_style_model', revision='v1.0.2')

        self.face_adapter_path = adapter_path
        self.cfg_face = cfg_face
        
        fr_weight_path = snapshot_download('yucheng1996/FaceChain-FACT', revision='v1.0.0')
        fr_weight_path = os.path.join(fr_weight_path, 'ms1mv2_model_TransFace_S.pt')
        
        self.face_extracter = Face_Extracter_v1(fr_weight_path=fr_weight_path, fc_weight_path=self.face_adapter_path)
        self.face_detection0 = pipeline(task=Tasks.face_detection, model='damo/cv_resnet50_face-detection_retinaface')
        self.skin_retouching = pipeline(
            'skin-retouching-torch',
            model=snapshot_download('damo/cv_unet_skin_retouching_torch', revision='v1.0.1.1'))
        self.fair_face_attribute_func = pipeline(Tasks.face_attribute_recognition,
            snapshot_download('damo/cv_resnet34_face-attribute-recognition_fairface', revision='v2.0.2'))
        
        base_model_path = snapshot_download('YorickHe/majicmixRealistic_v6', revision='v1.0.0')
        base_model_path = os.path.join(base_model_path, 'realistic')
        
        pipe_pose = StableDiffusionControlNetPipeline.from_pretrained(
            base_model_path,
            safety_checker=None,
            controlnet=self.controlnet[0],
            torch_dtype=torch.float16)
        pipe_all = StableDiffusionControlNetInpaintPipeline.from_pretrained(
            base_model_path,
            safety_checker=None,
            controlnet=self.controlnet,
            torch_dtype=torch.float16)
        pipe_pose.scheduler = PNDMScheduler.from_config(
                pipe_pose.scheduler.config)
        pipe_all.scheduler = PNDMScheduler.from_config(
                pipe_all.scheduler.config)
        
        face_adapter_path = self.face_adapter_path
        self.face_adapter_pose = FaceAdapter_v1(pipe_pose, self.face_detection0, self.segmentation_pipeline, self.face_extracter, face_adapter_path, 'cuda', self.cfg_face) 
        self.face_adapter_all = FaceAdapter_v1(pipe_all, self.face_detection0, self.segmentation_pipeline, self.face_extracter, face_adapter_path, 'cuda', self.cfg_face) 
        self.face_adapter_pose.set_scale(0.5)
        self.face_adapter_all.set_scale(0.55)
        
        self.face_adapter_pose.pipe.to('cpu')
        self.face_adapter_all.pipe.to('cpu')
                   

    def __call__(self,
                 inpaint_img,
                 strength,
                 output_img_size,
                 num_faces,
                 selected_face,
                 pos_prompt,
                 neg_prompt,
                 input_img_path=None,
                 num_gen_images=1):
        st = time.time()
        self.use_main_model = True
        self.use_face_swap = True
        self.use_post_process = False
        self.use_stylization = False
        self.neg_prompt = neg_prompt
        self.inpaint_img = inpaint_img
        self.strength = strength
        self.num_faces = num_faces
        self.selected_face = selected_face
        self.output_img_size = output_img_size
        self.pos_prompt = pos_prompt

        if isinstance(self.inpaint_img, str):
            self.inpaint_img = Image.open(self.inpaint_img)
        else:
            self.inpaint_img = Image.fromarray(self.inpaint_img[:, :, ::-1])
        result_det = self.face_detection(self.inpaint_img)
        bboxes = result_det['boxes']
        assert len(bboxes) > self.num_faces - 1
        bboxes = np.array(bboxes).astype(np.int16)
        if len(bboxes) > self.num_faces:
            areas = np.zeros(len(bboxes))
            for i in range(len(bboxes)):
                bbox = bboxes[i]
                areas[i] = (float(bbox[2]) - float(bbox[0])) * (
                    float(bbox[3]) - float(bbox[1]))
            top_idxs = np.argsort(areas)[::-1][:self.num_faces]
            bboxes = bboxes[top_idxs]
            assert len(bboxes) == self.num_faces
        lefts = []
        for bbox in bboxes:
            lefts.append(bbox[0])
        idxs = np.argsort(lefts)

        if input_img_path != None:
            face_box = bboxes[idxs[self.selected_face - 1]]
            inpaint_img_large = np.copy(np.array(self.inpaint_img)[:, :, ::-1])
            mask_large = np.ones_like(inpaint_img_large)
            mask_large1 = np.zeros_like(inpaint_img_large)
            h, w, _ = inpaint_img_large.shape
            for i in range(len(bboxes)):
                if i != idxs[self.selected_face - 1]:
                    bbox = bboxes[i]
                    inpaint_img_large[bbox[1]:bbox[3], bbox[0]:bbox[2]] = 0
                    mask_large[bbox[1]:bbox[3], bbox[0]:bbox[2]] = 0

            face_ratio = 0.7
            cropl = int(
                max(face_box[3] - face_box[1], face_box[2] - face_box[0])
                / face_ratio / 2)
            cx = int((face_box[2] + face_box[0]) / 2)
            cy = int((face_box[1] + face_box[3]) / 2)
            cropup = min(cy, cropl)
            cropbo = min(h - cy, cropl)
            crople = min(cx, cropl)
            cropri = min(w - cx, cropl)
            inpaint_img = np.pad(
                inpaint_img_large[cy - cropup:cy + cropbo,
                                  cx - crople:cx + cropri],
                ((cropl - cropup, cropl - cropbo),
                 (cropl - crople, cropl - cropri), (0, 0)), 'constant')
            inpaint_img = cv2.resize(inpaint_img, (512, 512))
            inpaint_img = Image.fromarray(inpaint_img[:, :, ::-1])
            mask_large1[cy - cropup:cy + cropbo, cx - crople:cx + cropri] = 1
            mask_large = mask_large * mask_large1
            
            input_img = Image.open(input_img_path).convert('RGB')
            w, h = input_img.size
            if max(w, h) > 2000:
                scale = 2000 / max(w, h)
                input_img = input_img.resize((int(w * scale), int(h * scale)))

            result = self.skin_retouching(np.array(input_img)[:,:,::-1])
            input_img = result[OutputKeys.OUTPUT_IMG]

            input_img = Image.fromarray(input_img[:, :, ::-1])
            
            attribute_result = self.fair_face_attribute_func(input_img)
            score_gender = np.array(attribute_result['scores'][0])
            score_age = np.array(attribute_result['scores'][1])
            gender = np.argmax(score_gender)
            age = np.argmax(score_age)
            if age < 2:
                if gender == 0:
                    attr_idx = 0
                else:
                    attr_idx = 1
            elif age > 4:
                if gender == 0:
                    attr_idx = 4
                else:
                    attr_idx = 5
            else:
                if gender == 0:
                    attr_idx = 2
                else:
                    attr_idx = 3
            use_age_prompt = True
            if attr_idx == 3 or attr_idx == 5:
                use_age_prompt = False

            age_prompts = ['20-year-old, ', '25-year-old, ', '35-year-old, ']

            if age > 1 and age < 5 and use_age_prompt:
                self.pos_prompt = age_prompts[age - 2] + self.pos_prompt
            
            trigger_styles = [
                'a boy, children, ', 'a girl, children, ',
                'a handsome man, ', 'a beautiful woman, ',
                'a mature man, ', 'a mature woman, '
            ]
            trigger_style = trigger_styles[attr_idx]
            if attr_idx == 2 or attr_idx == 4:
                self.neg_prompt += ', children'
            
            self.pos_prompt = trigger_style + self.pos_prompt
            
            self.face_adapter_pose.pipe.to('cuda')
            self.face_adapter_all.pipe.to('cuda')

            gen_results, is_old = main_model_inference(
                num_gen_images,
                inpaint_img,
                self.strength,
                self.output_img_size,
                self.pos_prompt,
                self.neg_prompt,
                self.use_main_model,
                input_img=input_img,
                segmentation_pipeline=self.segmentation_pipeline,
                image_face_fusion=self.image_face_fusion,
                openpose=self.openpose,
                controlnet=self.controlnet,
                det_pipeline=self.face_detection,
                pipe_pose=self.face_adapter_pose,
                pipe_all=self.face_adapter_all,
                face_quality_func=self.face_quality_func)
            mt = time.time()
            self.face_adapter_pose.pipe.to('cpu')
            self.face_adapter_all.pipe.to('cpu')

            selected_face = input_img

            swap_results = face_swap_fn(self.use_face_swap, gen_results,
                                        selected_face, self.image_face_fusion,
                                        self.segmentation_pipeline)
            # stylization
            final_gen_results = swap_results
            
            final_gen_results_new = []
            inpaint_img_large = np.copy(np.array(self.inpaint_img)[:, :, ::-1])
            ksize = int(10 * cropl / 256)
            for i in range(len(final_gen_results)):
                print('Start cropping.')
                rst_gen = cv2.resize(final_gen_results[i],
                                     (cropl * 2, cropl * 2))
                rst_crop = rst_gen[cropl - cropup:cropl + cropbo,
                                   cropl - crople:cropl + cropri]
                print(rst_crop.shape)
                inpaint_img_rst = np.zeros_like(inpaint_img_large)
                print('Start pasting.')
                inpaint_img_rst[cy - cropup:cy + cropbo,
                                cx - crople:cx + cropri] = rst_crop
                print('Fininsh pasting.')
                print(inpaint_img_rst.shape, mask_large.shape,
                      inpaint_img_large.shape)
                mask_large = mask_large.astype(np.float32)
                kernel = np.ones((ksize * 2, ksize * 2))
                mask_large1 = cv2.erode(mask_large, kernel, iterations=1)
                mask_large1 = cv2.GaussianBlur(
                    mask_large1,
                    (int(ksize * 1.8) * 2 + 1, int(ksize * 1.8) * 2 + 1), 0)
                mask_large1[face_box[1]:face_box[3],
                            face_box[0]:face_box[2]] = 1
                mask_large = mask_large * mask_large1
                final_inpaint_rst = (
                    inpaint_img_rst.astype(np.float32)
                    * mask_large.astype(np.float32)
                    + inpaint_img_large.astype(np.float32) *
                    (1.0 - mask_large.astype(np.float32))).astype(np.uint8)
                print('Finish masking.')
                final_gen_results_new.append(final_inpaint_rst)
                print('Finish generating.')

        et = time.time()
        print('Inference Time: {:.4f}s'.format(et - st))
        print('Inference Time Process: {:.4f}s'.format(et - mt))
        torch.cuda.empty_cache()
        return final_gen_results_new

def compress_image(input_path, target_size):
    output_path = change_extension_to_jpg(input_path)

    image = cv2.imread(input_path)

    quality = 95
    while cv2.imencode('.jpg', image, [cv2.IMWRITE_JPEG_QUALITY, quality
                                       ])[1].size > target_size:
        quality -= 5

    compressed_image = cv2.imencode(
        '.jpg', image, [cv2.IMWRITE_JPEG_QUALITY, quality])[1].tostring()

    with open(output_path, 'wb') as f:
        f.write(compressed_image)
    return output_path

def change_extension_to_jpg(image_path):

    base_name = os.path.basename(image_path)
    new_base_name = os.path.splitext(base_name)[0] + '.jpg'

    directory = os.path.dirname(image_path)

    new_image_path = os.path.join(directory, new_base_name)
    return new_image_path

if __name__ == "__main__":
    num_faces = 1
    selected_face = 1
    strength = 0.6
    inpaint_img = 'poses/man/pose1.png'
    input_img_path = 'poses/man/pose2.png'
    num_generate = 1
    output_dir = './generated'
    
    pos_prompt = 'raw photo, masterpiece, simple background, solo, medium shot, high detail face, photorealistic, best quality, wearing T-shirt'
    neg_prompt = 'nsfw, paintings, sketches, (worst quality:2), (low quality:2), lowers, normal quality, ((monochrome)), ((grayscale)), logo, word, character'
    output_img_size = 512

    gen_portrait = GenPortraitInpaint()

    outputs = gen_portrait(inpaint_img, strength, output_img_size, num_faces, selected_face, pos_prompt, neg_prompt, input_img_path, num_generate)
    os.makedirs(output_dir, exist_ok=True)

    for i, out_img in enumerate(outputs):
        cv2.imwrite(os.path.join(output_dir, f'{i}.png'), out_img)

(2)web端测试

未完......

更多详细的内容欢迎关注:杰哥新技术
 

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.coloradmin.cn/o/1956121.html

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈,一经查实,立即删除!

相关文章

【redis】对hash类型和list类型的常用命令,应用场景,内部编码的总结

˃͈꒵˂͈꒱ write in front ꒰˃͈꒵˂͈꒱ ʕ̯•͡˔•̯᷅ʔ大家好&#xff0c;我是xiaoxie.希望你看完之后,有不足之处请多多谅解&#xff0c;让我们一起共同进步૮₍❀ᴗ͈ . ᴗ͈ აxiaoxieʕ̯•͡˔•̯᷅ʔ—CSDN博客 本文由xiaoxieʕ̯•͡˔•̯᷅ʔ 原创 CSDN 如…

初识c++:vector全方面讲解及其代码模拟实现

本节大纲&#xff1a; vector全方面讲解及其代码模拟实现 1.学习vector方法 2.vector的使用 2.1 vector的定义 2.2 vector iterator 的使用 2.3 vector 空间增长问题 2.4 vector 增删查改 2.5 vector 迭代器失效问题&#xff08;重点&#xff09; 3.vector 深度刨析 4…

【Vulnhub系列】Vulnhub_Raven2靶场渗透(原创)

【Vulnhub系列靶场】Vulnhub_Raven2 渗透 原文转载已经过授权 原文链接&#xff1a;Lusen的小窝 - 学无止尽&#xff0c;不进则退 (lusensec.github.io) 一、环境准备 从网盘下载该靶机&#xff0c;在vm中选择【打开】 然后设置好存储路径&#xff0c;开机后检查靶机的网络连…

主宰生物进化的 “魔咒” —— 制约生物在特殊环境下进化方式的线索

一些神秘的法则制约着生物在特殊环境下的进化方式。它们还为动物将如何适应气候变暖提供了线索。 一些奇特的进化现象 一艘装满大象和老鼠的 “诺亚方舟” 搁浅在一座偏远的小岛上。动物们都幸存下来并繁衍后代。但是&#xff0c;随着世代相传&#xff0c;奇怪的事情发生了&a…

【Matlab】快速傅里叶变换fft代码(单边谱)

单边谱&#xff0c;横纵坐标还原代码 参考链接 主代码 function main() clc clear close all %% %仿真时间 dt0.01; t_end10; timedt:dt:t_end;%注意&#xff0c;length(time)会对fft结果的精度有影响 %对应振幅&#xff0c;频率&#xff1a;10、5、2Hz Z20*sin(2*pi*20*time…

使用abpcli创建项目时提示数据库迁移失败

问题描述 使用abpcli创建项目时提示数据库迁移失败&#xff01; 解决方案&#xff1a; 1、检查数据库连接字符串 {"ConnectionStrings": {"Default": "serverlocalhost;port3306;databaseAcmeBookStore;userroot;passwordyour_password;"} }2、…

盗梦空间续集(InceptionNeXt):使用Inception优化加速ConvNeXt实现ImageNet-1K的最佳精度

Abstract 灵感来自ViT的长距离建模能力&#xff0c;大核卷积最近被广泛研究和采用&#xff0c;以扩大感受野并提高模型性能&#xff0c;例如显著的工作ConvNeXt采用77深度卷积。虽然这种深度算子只消耗少量的FLOPs&#xff0c;但由于高内存访问成本&#xff0c;它在强大计算设…

环境配置—批量删除时提示/usr/bin/rm: Argument list too long问题

文章目录 1. 背景介绍2. 问题原因3. 解决方案3.1 分批删除文件3.2 使用 find 命令直接删除3.3 增加系统参数限制3.4 使用循环删除 4. 结论 1. 背景介绍 在Linux系统中&#xff0c;rm 命令是删除文件和目录的常用工具。然而&#xff0c;当需要删除大量文件时&#xff0c;可能会…

【CPS出版】2024年智能计算与数据分析国际学术会议(ICDA 2024,9月6日-8)

为探讨数据科学和计算智能领域的关键问题&#xff0c;促进相关交流&#xff0c;2024年智能计算与数据分析国际学术会议&#xff08;ICDA 2024)将于2024年9月6日-8日在中国青岛召开。 本届会议拟邀请数据分析和计算智能领域的顶级专家、学者和产业界优秀人才&#xff0c;围绕当前…

党员管理系统2024(代码+论文+ppt)

下载在最后 技术栈: ssmmysql 展示: 下载地址: CSDN现在上传有问题,有兴趣的朋友先收藏.正常了贴上下载地址 备注:

Hive环境搭建(Mysql数据库)

【实验目的】 1) 了解hive的作用 2) 熟练hive的配置过程&#xff08;Mysql数据库&#xff09; 【实验原理】 Hive工具中默认使用的是derby数据库&#xff0c;该数据库使用简单&#xff0c;操作灵活&#xff0c;但是存在一定的局限性&#xff0c;hive支持使用第三方数据库&…

探索C++中的多态性:理解虚函数和运行时多态

前言&#xff1a; 在现代软件开发中&#xff0c;面向对象编程&#xff08;OOP&#xff09;已经成为了主流。其中一个强大的概念就是多态性&#xff08;Polymorphism&#xff09;&#xff0c;它不仅仅是一种技术&#xff0c;更是一种设计思想和实现方式&#xff0c;为软件开发带…

Python批量采集某东评论,实现可视化分析

女朋友没事就喜欢网购&#xff0c;买一大堆又不用&#xff0c;总说不合适&#xff0c;为了不让她花冤枉钱&#xff0c;于是我决定用Python写一个采集商品评论的脚本&#xff0c;然后对商品进行分析&#xff0c;这样就不怕踩到坑了&#xff01; 让我们直接开始本次操作 准备工作…

PPT模板背景音乐去除攻略:3个方法教你轻松删除自带背景乐!

PPT模板中自带的背景乐却找不到明显的播放图标&#xff0c;同时在幻灯片切换效果中也已选择“无声音”&#xff0c;但播放时仍有背景乐&#xff0c;这可能是由于音乐被嵌入到了幻灯片母版中。 针对这种情况&#xff0c;以下是一些去掉背景音乐的步骤&#xff1a; 方法一&…

【C++】C++11的新特性 --- lambda表达式 ,新的类功能,模块的可变参数 , emplace系列接口

如果你停止,就是谷底! 如果你还在继续,就是上坡! 这是我听过关于人生低谷最好的阐述。 -- 刘同 C11的新特性 1 lambda表达式1.1 基本用法1.2 细谈参数列表与捕捉列表 2 新的类功能2.1 移动构造与移动赋值2.2 default和delete 3 模块的可变参数4 emplace系列接口Thanks♪(&…

国防科技大学深圳地区新生欢送会圆满举行

2024年7月28日&#xff0c;第97个八一建军节来临之际&#xff0c;在这个充满希望的盛夏时节&#xff0c;深圳地区迎来了13名即将踏入国防科技大学的优秀学子。 为了庆祝这一荣耀时刻&#xff0c;并表达对新生的深切祝福&#xff0c;在国防科技大学深圳校友会黄丹会长的积极倡议…

小白也能读懂的ConvLSTM!(开源pytorch代码)

ConvLSTM 1. 算法简介与应用场景2. 算法原理2.1 LSTM基础2.2 ConvLSTM原理2.2.1 ConvLSTM的结构2.2.2 卷积操作的优点 2.3 LSTM与ConvLSTM的对比分析2.4 ConvLSTM的应用 3. PyTorch代码参考文献 仅需要网络源码的可以直接跳到末尾即可 1. 算法简介与应用场景 ConvLSTM&#x…

“手撕”MySQL的索引

目录 二、索引的作用 三、索引的缺点 四、如何使用索引 查看索引&#xff1a; 创建索引&#xff1a; ​编辑 删除索引&#xff1a; 五、索引的底层原理 那什么是B树&#xff0c;什么是B树呢&#xff1f; B树的好处&#xff1a; 总结&#xff1a; 一、什么是索引 索…

OpenCV 图像预处理—图像金字塔

文章目录 相关概念高斯金字塔拉普拉斯金字塔应用 构建高斯金字塔为什么要对当前层进行模糊&#xff1f;1. 平滑处理2. 减少混叠&#xff08;Aliasing&#xff09;3. 多尺度表示4. 图像降采样 举个栗子创建高斯金字塔和拉普拉斯金字塔&#xff0c;并用拉普拉斯金字塔恢复图像 相…