* feat/yoloface (#334)

* added yolov8 to face_detector (#323)

* added yolov8 to face_detector

* added yolov8 to face_detector

* Initial cleanup and renaming

* Update README

* refactored detect_with_yoloface (#329)

* refactored detect_with_yoloface

* apply review

* Change order again

* Restore working code

* modified code (#330)

* refactored detect_with_yoloface

* apply review

* use temp_frame in detect_with_yoloface

* reorder

* modified

* reorder models

* Tiny cleanup

---------

Co-authored-by: tamoharu <133945583+tamoharu@users.noreply.github.com>

* include audio file functions (#336)

* Add testing for audio handlers

* Change order

* Fix naming

* Use correct typing in choices

* Update help message for arguments, Notation based wording approach (#347)

* Update help message for arguments, Notation based wording approach

* Fix installer

* Audio functions (#345)

* Update ffmpeg.py

* Create audio.py

* Update ffmpeg.py

* Update audio.py

* Update audio.py

* Update typing.py

* Update ffmpeg.py

* Update audio.py

* Rename Frame to VisionFrame (#346)

* Minor tidy up

* Introduce audio testing

* Add more todo for testing

* Add more todo for testing

* Fix indent

* Enable venv on the fly

* Enable venv on the fly

* Revert venv on the fly

* Revert venv on the fly

* Force Gradio to shut up

* Force Gradio to shut up

* Clear temp before processing

* Reduce terminal output

* include audio file functions

* Enforce output resolution on merge video

* Minor cleanups

* Add age and gender to face debugger items (#353)

* Add age and gender to face debugger items

* Rename like suggested in the code review

* Fix the output framerate vs. time

* Lip Sync (#356)

* Cli implementation of wav2lip

* - create get_first_item()
- remove non gan wav2lip model
- implement video memory strategy
- implement get_reference_frame()
- implement process_image()
- rearrange crop_mask_list
- implement test_cli

* Simplify testing

* Rename to lip syncer

* Fix testing

* Fix testing

* Minor cleanup

* Cuda 12 installer (#362)

* Make cuda nightly (12) the default

* Better keep legacy cuda just in case

* Use CUDA and ROCM versions

* Remove MacOS options from installer (CoreML include in default package)

* Add lip-syncer support to source component

* Add lip-syncer support to source component

* Fix the check in the source component

* Add target image check

* Introduce more helpers to suite the lip-syncer needs

* Downgrade onnxruntime as of buggy 1.17.0 release

* Revert "Downgrade onnxruntime as of buggy 1.17.0 release"

This reverts commit f4a7ae6824fed87f0be50906bbc7e2d61d00617b.

* More testing and add todos

* Fix the frame processor API to at least not throw errors

* Introduce dict based frame processor inputs (#364)

* Introduce dict based frame processor inputs

* Forgot to adjust webcam

* create path payloads (#365)

* create index payload to paths for process_frames

* rename to payload_paths

* This code now is poetry

* Fix the terminal output

* Make lip-syncer work in the preview

* Remove face debugger test for now

* Reoder reference_faces, Fix testing

* Use inswapper_128 on buggy onnxruntime 1.17.0

* Undo inswapper_128_fp16 duo broken onnxruntime 1.17.0

* Undo inswapper_128_fp16 duo broken onnxruntime 1.17.0

* Fix lip_syncer occluder & region mask issue

* Fix preview once in case there was no output video fps

* fix lip_syncer custom fps

* remove unused import

* Add 68 landmark functions (#367)

* Add 68 landmark model

* Add landmark to face object

* Re-arrange and modify typing

* Rename function

* Rearrange

* Rearrange

* ignore type

* ignore type

* change type

* ignore

* name

* Some cleanup

* Some cleanup

* Opps, I broke something

* Feat/face analyser refactoring (#369)

* Restructure face analyser and start TDD

* YoloFace and Yunet testing are passing

* Remove offset from yoloface detection

* Cleanup code

* Tiny fix

* Fix get_many_faces()

* Tiny fix (again)

* Use 320x320 fallback for retinaface

* Fix merging mashup

* Upload wave2lip model

* Upload 2dfan2 model and rename internal to face_predictor

* Downgrade onnxruntime for most cases

* Update for the face debugger to render landmark 68

* Try to make detect_face_landmark_68() and detect_gender_age() more uniform

* Enable retinaface testing for 320x320

* Make detect_face_landmark_68() and detect_gender_age() as uniform as … (#370)

* Make detect_face_landmark_68() and detect_gender_age() as uniform as possible

* Revert landmark scale and translation

* Make box-mask for lip-syncer adjustable

* Add create_bbox_from_landmark()

* Remove currently unused code

* Feat/uniface (#375)

* add uniface (#373)

* Finalize UniFace implementation

---------

Co-authored-by: Harisreedhar <46858047+harisreedhar@users.noreply.github.com>

* My approach how todo it

* edit

* edit

* replace vertical blur with gaussian

* remove region mask

* Rebase against next and restore method

* Minor improvements

* Minor improvements

* rename & add forehead padding

* Adjust and host uniface model

* Use 2dfan4 model

* Rename to face landmarker

* Feat/replace bbox with bounding box (#380)

* Add landmark 68 to 5 convertion

* Add landmark 68 to 5 convertion

* Keep 5, 5/68 and 68 landmarks

* Replace kps with landmark

* Replace bbox with bounding box

* Reshape face_landmark5_list different

* Make yoloface the default

* Move convert_face_landmark_68_to_5 to face_helper

* Minor spacing issue

* Dynamic detector sizes according to model (#382)

* Dynamic detector sizes according to model

* Dynamic detector sizes according to model

* Undo false commited files

* Add lib syncer model to the UI

* fix halo (#383)

* Bump to 2.3.0

* Update README and wording

* Update README and wording

* Fix spacing

* Apply _vision suffix

* Apply _vision suffix

* Apply _vision suffix

* Apply _vision suffix

* Apply _vision suffix

* Apply _vision suffix

* Apply _vision suffix, Move mouth mask to face_masker.py

* Apply _vision suffix

* Apply _vision suffix

* increase forehead padding

---------

Co-authored-by: tamoharu <133945583+tamoharu@users.noreply.github.com>
Co-authored-by: Harisreedhar <46858047+harisreedhar@users.noreply.github.com>
This commit is contained in:
Henry Ruhs
2024-02-14 14:08:29 +01:00
committed by GitHub
parent 122da0545b
commit c77493ff9a
66 changed files with 1893 additions and 884 deletions

View File

@@ -1,63 +1,21 @@
WORDING =\
from typing import Any, Dict, Optional
WORDING : Dict[str, Any] =\
{
'python_not_supported': 'Python version is not supported, upgrade to {version} or higher',
'ffmpeg_not_installed': 'FFMpeg is not installed',
'install_dependency_help': 'select the variant of {dependency} to install',
'skip_venv_help': 'skip the virtual environment check',
'source_help': 'select a source image',
'target_help': 'select a target image or video',
'output_help': 'specify the output file or directory',
'frame_processors_help': 'choose from the available frame processors (choices: {choices}, ...)',
'frame_processor_model_help': 'choose the model for the frame processor',
'frame_processor_blend_help': 'specify the blend amount for the frame processor',
'face_debugger_items_help': 'specify the face debugger items (choices: {choices})',
'ui_layouts_help': 'choose from the available ui layouts (choices: {choices}, ...)',
'keep_temp_help': 'retain temporary frames after processing',
'skip_audio_help': 'omit audio from the target',
'face_analyser_order_help': 'specify the order used for the face analyser',
'face_analyser_age_help': 'specify the age used for the face analyser',
'face_analyser_gender_help': 'specify the gender used for the face analyser',
'face_detector_model_help': 'specify the model used for the face detector',
'face_detector_size_help': 'specify the size threshold used for the face detector',
'face_detector_score_help': 'specify the score threshold used for the face detector',
'face_selector_mode_help': 'specify the mode for the face selector',
'reference_face_position_help': 'specify the position of the reference face',
'reference_face_distance_help': 'specify the distance between the reference face and the target face',
'reference_frame_number_help': 'specify the number of the reference frame',
'face_mask_types_help': 'choose from the available face mask types (choices: {choices})',
'face_mask_blur_help': 'specify the blur amount for face mask',
'face_mask_padding_help': 'specify the face mask padding (top, right, bottom, left) in percent',
'face_mask_regions_help': 'choose from the available face mask regions (choices: {choices})',
'trim_frame_start_help': 'specify the start frame for extraction',
'trim_frame_end_help': 'specify the end frame for extraction',
'temp_frame_format_help': 'specify the image format used for frame extraction',
'temp_frame_quality_help': 'specify the image quality used for frame extraction',
'output_image_quality_help': 'specify the quality used for the output image',
'output_video_encoder_help': 'specify the encoder used for the output video',
'output_video_preset_help': 'specify the preset used for the output video',
'output_video_quality_help': 'specify the quality used for the output video',
'output_video_resolution_help': 'specify the resolution used for the output video',
'output_video_fps_help': 'specify the frames per second (fps) used for the output video',
'video_memory_strategy_help': 'specify strategy to handle the video memory',
'system_memory_limit_help': 'specify the amount (gb) of system memory to be used',
'execution_providers_help': 'choose from the available execution providers (choices: {choices}, ...)',
'execution_thread_count_help': 'specify the number of execution threads',
'execution_queue_count_help': 'specify the number of execution queries',
'skip_download_help': 'omit automate downloads and lookups',
'headless_help': 'run the program in headless mode',
'log_level_help': 'choose from the available log levels',
'creating_temp': 'Creating temporary resources',
'extracting_frames_fps': 'Extracting frames with {video_fps} FPS',
'analysing': 'Analysing',
'processing': 'Processing',
'downloading': 'Downloading',
'temp_frames_not_found': 'Temporary frames not found',
'compressing_image': 'Compressing image',
'compressing_image_failed': 'Compressing image failed',
'compressing_image_succeed': 'Compressing image succeed',
'compressing_image_skipped': 'Compressing image skipped',
'merging_video_fps': 'Merging video with {video_fps} FPS',
'merging_video_failed': 'Merging video failed',
'skipping_audio': 'Skipping audio',
'restoring_audio': 'Restoring audio',
'restoring_audio_succeed': 'Restoring audio succeed',
'restoring_audio_skipped': 'Restoring audio skipped',
'clearing_temp': 'Clearing temporary resources',
'processing_image_succeed': 'Processing to image succeed in {seconds} seconds',
@@ -66,78 +24,176 @@ WORDING =\
'processing_video_failed': 'Processing to video failed',
'model_download_not_done': 'Download of the model is not done',
'model_file_not_present': 'File of the model is not present',
'select_image_source': 'Select an image for source path',
'select_image_or_video_target': 'Select an image or video for target path',
'select_file_or_directory_output': 'Select an file or directory for output path',
'select_image_source': 'Select a image for source path',
'select_audio_source': 'Select a audio for source path',
'select_video_target': 'Select a video for target path',
'select_image_or_video_target': 'Select a image or video for target path',
'select_file_or_directory_output': 'Select a file or directory for output path',
'no_source_face_detected': 'No source face detected',
'frame_processor_not_loaded': 'Frame processor {frame_processor} could not be loaded',
'frame_processor_not_implemented': 'Frame processor {frame_processor} not implemented correctly',
'ui_layout_not_loaded': 'UI layout {ui_layout} could not be loaded',
'ui_layout_not_implemented': 'UI layout {ui_layout} not implemented correctly',
'stream_not_loaded': 'Stream {stream_mode} could not be loaded',
'donate_button_label': 'DONATE',
'start_button_label': 'START',
'stop_button_label': 'STOP',
'clear_button_label': 'CLEAR',
'benchmark_runs_checkbox_group_label': 'BENCHMARK RUNS',
'benchmark_results_dataframe_label': 'BENCHMARK RESULTS',
'benchmark_cycles_slider_label': 'BENCHMARK CYCLES',
'execution_providers_checkbox_group_label': 'EXECUTION PROVIDERS',
'execution_thread_count_slider_label': 'EXECUTION THREAD COUNT',
'execution_queue_count_slider_label': 'EXECUTION QUEUE COUNT',
'face_analyser_order_dropdown_label': 'FACE ANALYSER ORDER',
'face_analyser_age_dropdown_label': 'FACE ANALYSER AGE',
'face_analyser_gender_dropdown_label': 'FACE ANALYSER GENDER',
'face_detector_model_dropdown_label': 'FACE DETECTOR MODEL',
'face_detector_size_dropdown_label': 'FACE DETECTOR SIZE',
'face_detector_score_slider_label': 'FACE DETECTOR SCORE',
'face_selector_mode_dropdown_label': 'FACE SELECTOR MODE',
'reference_face_gallery_label': 'REFERENCE FACE',
'reference_face_distance_slider_label': 'REFERENCE FACE DISTANCE',
'face_mask_types_checkbox_group_label': 'FACE MASK TYPES',
'face_mask_blur_slider_label': 'FACE MASK BLUR',
'face_mask_padding_top_slider_label': 'FACE MASK PADDING TOP',
'face_mask_padding_bottom_slider_label': 'FACE MASK PADDING BOTTOM',
'face_mask_padding_left_slider_label': 'FACE MASK PADDING LEFT',
'face_mask_padding_right_slider_label': 'FACE MASK PADDING RIGHT',
'face_mask_region_checkbox_group_label': 'FACE MASK REGIONS',
'video_memory_strategy_dropdown_label': 'VIDEO MEMORY STRATEGY',
'system_memory_limit_slider_label': 'SYSTEM MEMORY LIMIT',
'output_image_or_video_label': 'OUTPUT',
'output_path_textbox_label': 'OUTPUT PATH',
'output_image_quality_slider_label': 'OUTPUT IMAGE QUALITY',
'output_video_encoder_dropdown_label': 'OUTPUT VIDEO ENCODER',
'output_video_preset_dropdown_label': 'OUTPUT VIDEO PRESET',
'output_video_quality_slider_label': 'OUTPUT VIDEO QUALITY',
'output_video_resolution_dropdown_label': 'OUTPUT VIDEO RESOLUTION',
'output_video_fps_slider_label': 'OUTPUT VIDEO FPS',
'preview_image_label': 'PREVIEW',
'preview_frame_slider_label': 'PREVIEW FRAME',
'frame_processors_checkbox_group_label': 'FRAME PROCESSORS',
'face_swapper_model_dropdown_label': 'FACE SWAPPER MODEL',
'face_enhancer_model_dropdown_label': 'FACE ENHANCER MODEL',
'face_enhancer_blend_slider_label': 'FACE ENHANCER BLEND',
'frame_enhancer_model_dropdown_label': 'FRAME ENHANCER MODEL',
'frame_enhancer_blend_slider_label': 'FRAME ENHANCER BLEND',
'face_debugger_items_checkbox_group_label': 'FACE DEBUGGER ITEMS',
'common_options_checkbox_group_label': 'OPTIONS',
'temp_frame_format_dropdown_label': 'TEMP FRAME FORMAT',
'temp_frame_quality_slider_label': 'TEMP FRAME QUALITY',
'trim_frame_start_slider_label': 'TRIM FRAME START',
'trim_frame_end_slider_label': 'TRIM FRAME END',
'source_file_label': 'SOURCE',
'target_file_label': 'TARGET',
'webcam_image_label': 'WEBCAM',
'webcam_mode_radio_label': 'WEBCAM MODE',
'webcam_resolution_dropdown': 'WEBCAM RESOLUTION',
'webcam_fps_slider': 'WEBCAM FPS',
'point': '.',
'comma': ',',
'colon': ':',
'question_mark': '?',
'exclamation_mark': '!'
'exclamation_mark': '!',
'help':
{
# installer
'install_dependency': 'select the variant of {dependency} to install',
'skip_venv': 'skip the virtual environment check',
# general
'source': 'choose single or multiple source images',
'target': 'choose single target image or video',
'output': 'specify the output file or directory',
# misc
'skip_download': 'omit automate downloads and remote lookups',
'headless': 'run the program without a user interface',
'log_level': 'adjust the message severity displayed in the terminal',
# execution
'execution_providers': 'accelerate the model inference using different providers (choices: {choices}, ...)',
'execution_thread_count': 'specify the amount of parallel threads while processing',
'execution_queue_count': 'specify the amount of frames each thread is processing',
# memory
'video_memory_strategy': 'balance fast frame processing and low vram usage',
'system_memory_limit': 'limit the available ram that can be used while processing',
# face analyser
'face_analyser_order': 'specify the order in which the face analyser detects faces.',
'face_analyser_age': 'filter the detected faces based on their age',
'face_analyser_gender': 'filter the detected faces based on their gender',
'face_detector_model': 'choose the model responsible for detecting the face',
'face_detector_size': 'specify the size of the frame provided to the face detector',
'face_detector_score': 'filter the detected faces base on the confidence score',
# face selector
'face_selector_mode': 'use reference based tracking with simple matching',
'reference_face_position': 'specify the position used to create the reference face',
'reference_face_distance': 'specify the desired similarity between the reference face and target face',
'reference_frame_number': 'specify the frame used to create the reference face',
# face mask
'face_mask_types': 'mix and match different face mask types (choices: {choices})',
'face_mask_blur': 'specify the degree of blur applied the box mask',
'face_mask_padding': 'apply top, right, bottom and left padding to the box mask',
'face_mask_regions': 'choose the facial features used for the region mask (choices: {choices})',
# frame extraction
'trim_frame_start': 'specify the the start frame of the target video',
'trim_frame_end': 'specify the the end frame of the target video',
'temp_frame_format': 'specify the temporary resources format',
'temp_frame_quality': 'specify the temporary resources quality',
'keep_temp': 'keep the temporary resources after processing',
# output creation
'output_image_quality': 'specify the image quality which translates to the compression factor',
'output_video_encoder': 'specify the encoder use for the video compression',
'output_video_preset': 'balance fast video processing and video file size',
'output_video_quality': 'specify the video quality which translates to the compression factor',
'output_video_resolution': 'specify the video output resolution based on the target video',
'output_video_fps': 'specify the video output fps based on the target video',
'skip_audio': 'omit the audio from the target video',
# frame processors
'frame_processors': 'load a single or multiple frame processors. (choices: {choices}, ...)',
'face_debugger_items': 'load a single or multiple frame processors (choices: {choices})',
'face_enhancer_model': 'choose the model responsible for enhancing the face',
'face_enhancer_blend': 'blend the enhanced into the previous face',
'face_swapper_model': 'choose the model responsible for swapping the face',
'frame_enhancer_model': 'choose the model responsible for enhancing the frame',
'frame_enhancer_blend': 'blend the enhanced into the previous frame',
'lip_syncer_model': 'choose the model responsible for syncing the lips',
# uis
'ui_layouts': 'launch a single or multiple UI layouts (choices: {choices}, ...)'
},
'uis':
{
# general
'start_button': 'START',
'stop_button': 'STOP',
'clear_button': 'CLEAR',
# about
'donate_button': 'DONATE',
# benchmark
'benchmark_results_dataframe': 'BENCHMARK RESULTS',
# benchmark options
'benchmark_runs_checkbox_group': 'BENCHMARK RUNS',
'benchmark_cycles_slider': 'BENCHMARK CYCLES',
# common options
'common_options_checkbox_group': 'OPTIONS',
# execution
'execution_providers_checkbox_group': 'EXECUTION PROVIDERS',
# execution queue count
'execution_queue_count_slider': 'EXECUTION QUEUE COUNT',
# execution thread count
'execution_thread_count_slider': 'EXECUTION THREAD COUNT',
# face analyser
'face_analyser_order_dropdown': 'FACE ANALYSER ORDER',
'face_analyser_age_dropdown': 'FACE ANALYSER AGE',
'face_analyser_gender_dropdown': 'FACE ANALYSER GENDER',
'face_detector_model_dropdown': 'FACE DETECTOR MODEL',
'face_detector_size_dropdown': 'FACE DETECTOR SIZE',
'face_detector_score_slider': 'FACE DETECTOR SCORE',
# face masker
'face_mask_types_checkbox_group': 'FACE MASK TYPES',
'face_mask_blur_slider': 'FACE MASK BLUR',
'face_mask_padding_top_slider': 'FACE MASK PADDING TOP',
'face_mask_padding_right_slider': 'FACE MASK PADDING RIGHT',
'face_mask_padding_bottom_slider': 'FACE MASK PADDING BOTTOM',
'face_mask_padding_left_slider': 'FACE MASK PADDING LEFT',
'face_mask_region_checkbox_group': 'FACE MASK REGIONS',
# face selector
'face_selector_mode_dropdown': 'FACE SELECTOR MODE',
'reference_face_gallery': 'REFERENCE FACE',
'reference_face_distance_slider': 'REFERENCE FACE DISTANCE',
# frame processors
'frame_processors_checkbox_group': 'FRAME PROCESSORS',
# frame processors options
'face_debugger_items_checkbox_group': 'FACE DEBUGGER ITEMS',
'face_enhancer_model_dropdown': 'FACE ENHANCER MODEL',
'face_enhancer_blend_slider': 'FACE ENHANCER BLEND',
'face_swapper_model_dropdown': 'FACE SWAPPER MODEL',
'frame_enhancer_model_dropdown': 'FRAME ENHANCER MODEL',
'frame_enhancer_blend_slider': 'FRAME ENHANCER BLEND',
'lip_syncer_model_dropdown': 'LIP SYNCER MODEL',
# memory
'video_memory_strategy_dropdown': 'VIDEO MEMORY STRATEGY',
'system_memory_limit_slider': 'SYSTEM MEMORY LIMIT',
# output
'output_image_or_video': 'OUTPUT',
# output options
'output_path_textbox': 'OUTPUT PATH',
'output_image_quality_slider': 'OUTPUT IMAGE QUALITY',
'output_video_encoder_dropdown': 'OUTPUT VIDEO ENCODER',
'output_video_preset_dropdown': 'OUTPUT VIDEO PRESET',
'output_video_quality_slider': 'OUTPUT VIDEO QUALITY',
'output_video_resolution_dropdown': 'OUTPUT VIDEO RESOLUTION',
'output_video_fps_slider': 'OUTPUT VIDEO FPS',
# preview
'preview_image': 'PREVIEW',
'preview_frame_slider': 'PREVIEW FRAME',
# source
'source_file': 'SOURCE',
# target
'target_file': 'TARGET',
# temp frame
'temp_frame_format_dropdown': 'TEMP FRAME FORMAT',
'temp_frame_quality_slider': 'TEMP FRAME QUALITY',
# trim frame
'trim_frame_start_slider': 'TRIM FRAME START',
'trim_frame_end_slider': 'TRIM FRAME END',
# webcam
'webcam_image': 'WEBCAM',
# webcam options
'webcam_mode_radio': 'WEBCAM MODE',
'webcam_resolution_dropdown': 'WEBCAM RESOLUTION',
'webcam_fps_slider': 'WEBCAM FPS'
}
}
def get(key : str) -> str:
return WORDING[key]
def get(key : str) -> Optional[str]:
if '.' in key:
section, name = key.split('.')
if section in WORDING and name in WORDING[section]:
return WORDING[section][name]
if key in WORDING:
return WORDING[key]
return None