[Python] 파이썬으로 yt-dlp 호출해서 사용하기 (EMBEDDING YT-DLP)

yt-dlp란 무엇인가?

현재 2023년을 기준으로, 스트리밍 업계에서 유튜브는 정말 독점적인 위치를 차지하고 있습니다. 게임 녹화 영상, 가수들의 음원, 오케스트라 연주 등 안올라오는게 없습니다. 그럼에 따라 유튜브 영상을 다운로드 하는 방법 역시 천차만별로 많습니다.

제가 생각하기에 유튜브 영상을 다운로드 하는 방법 중 상당히 신뢰할만한 방법은 바로 yt-dlp 라는 프로그램을 이용하는 것 입니다. 웹 사이트를 이용하는 방법도 있으나 대부분 느려터졌고, 심지어 랜섬웨어나 바이러스 덩어리를 퍼뜨리는 사이트도 많습니다.

하지만 yt-dlp를 이용하면 광고 없이 최대의 속도로 유튜브 영상 다운로드가 가능합니다.

yt-dlp 란 프로그램이 무슨 프로그램인지 궁금해 하시는 분들이 계실건데요. 간단히 소개해드리자면

https://github.com/ytdl-org/youtube-dl

GitHub - ytdl-org/youtube-dl: Command-line program to download videos from YouTube.com and other video sites

Command-line program to download videos from YouTube.com and other video sites - GitHub - ytdl-org/youtube-dl: Command-line program to download videos from YouTube.com and other video sites

github.com

원래 youtube-dl 이라고 유튜브와 다른 사이트의 동영상을 다운로드 받을 수 있는 파이썬으로 제작된 프로그램이 있었습니다. 이걸 쓰면 유튜브 영상을 쉽게 다운로드 받을 수 있었는데요. 아쉽게도 2021년부터 업데이트가 뚝 끊겼습니다.

https://github.com/yt-dlp/yt-dlp

GitHub - yt-dlp/yt-dlp: A youtube-dl fork with additional features and fixes

A youtube-dl fork with additional features and fixes - GitHub - yt-dlp/yt-dlp: A youtube-dl fork with additional features and fixes

github.com

그래서 어떤 착하신 분(?)이 youtube-dl 프로젝트를 통째로 복사(포크, Fork) 해서 유튜브 영상을 다운로드 받을 수 있도록 신규 프로젝트를 만들었습니다.

그게 바로 yt-dlp 입니다. yt-dlp 는 현재도 계속해서 신규 업데이트를 하면서 유튜브 영상 다운로드를 지원하고 있습니다. 따라서 유튜브 영상을 받을땐 yt-dlp 를 쓰는게 좋습니다. 또한 유튜브 이외에도 다양한 사이트의 영상 다운로드를 지원합니다.

안 쓸 이유가 없겠죠?

Python으로 yt-dlp 호출하기

제목을 보시면 아시다 싶이 오늘의 메인 토픽은 yt-dlp 의 사용법이 아닌 파이썬으로 코딩하면서 yt-dlp를 호출해 사용하는 방법입니다.

yt-dlp 는 리눅스, Mac, 윈도우 등 다양한 OS 환경에서 사용할 수 있는 실행 파일을 제공합니다. 운영체제에 맞게 실행 파일을 받으면 됩니다.

import subprocess

# 프로세스 실행하고 결과를 반환하는 함수
def run_process(command: list[str]) -> str:
    # command ex) ['ls', '-al']

    # 명령어 실행 후, 결과를 output에 저장
    # text = True : output을 text로 저장,
    # encoding은 기본이 cp949라 문제 없이 저장할려면 utf-8로 설정
    try:
        output: str = subprocess.check_output(
            command, stderr=subprocess.STDOUT, encoding="utf-8", text=True
        )
    except:
        raise Exception("명령어 실행에 실패했습니다", command)

    return output

def run_yt_dlp(command: list[str]) -> str:
    output = run_process(["yt-dlp", *command])
    return output

result = run_yt_dlp(["-h"])
print(result)

파이썬에선 os나 subprocess 패키지를 사용하면 프로세스를 실행시켜 결과값을 얻어낼 수 있으므로, yt-dlp 프로세스를 실행해 결과를 받아 yt-dlp를 사용할 수 있습니다.

그러나 이렇게 접근하는건 그다지 좋은 방법은 아닙니다.

프로세스를 생성하는건 무거우며 굳이 이렇게 할 필요도 없습니다.

애초에 yt-dlp는 파이썬으로 작성 되어 있어, 파이썬 코드로 바로 호출할 수 있어야 하기 때문입니다.

다른 언어라면 딱히 파이썬으로 작성된 yt-dlp를 프로세스 실행 말고 사용할 수 있는 방법이 없었겠지만 파이썬으로 코딩하는 경우 프로세스 실행 이외에도 코드 호출로도 yt-dlp를 사용할 수 있습니다.

https://github.com/yt-dlp/yt-dlp?tab=readme-ov-file#embedding-yt-dlp

GitHub - yt-dlp/yt-dlp: A youtube-dl fork with additional features and fixes

A youtube-dl fork with additional features and fixes - GitHub - yt-dlp/yt-dlp: A youtube-dl fork with additional features and fixes

github.com

yt-dlp 의 Github 페이지의 설명을 보시면 EMBEDDING YT-DLP 라는 부분에 파이썬 패키지로 바로 yt-dlp 코드를 호출할 수 있는 방법을 제공하고 있습니다.

from yt_dlp import YoutubeDL

urls = ['https://www.youtube.com/watch?v=BaW_jenozKc']
with YoutubeDL() as ydl:
    ydl.download(urls)

사용 방식은 다음과 같습니다.

pip install yt-dlp 를 통해 yt-dlp 패키지를 설치하고 위 코드를 실행시키면 됩니다.

기본적으로 YoutubeDL 이라는 class를 활용해 다운로드를 받으며, download 메서드(함수)의 경우 단일 문자열이 아닌 무조건 링크가 담긴 문자열 리스트만을 인자로 받습니다.

기본적으로 패키지 자체가 설명은 잘 되어 있는데 타입 힌트가 잘 안매겨져 있어서 사용할 때 좀 햇갈립니다. 아쉬울 따름입니다.

import json
import yt_dlp

url = 'https://www.youtube.com/watch?v=BaW_jenozKc'

# ℹ️ See help(yt_dlp.YoutubeDL) for a list of available options and public functions
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(url, download=False)

    # ℹ️ ydl.sanitize_info makes the info json-serializable
    json_info = json.dumps(ydl.sanitize_info(info))
    print(json_info)

그리고 다음 예제를 통해 유튜브 영상에 관한 거의 모든 정보를 json 데이터로 얻어낼 수 있습니다.

이외에도 Github 문서에 몇 가지의 예제가 설명으로 나와 있었습니다.

하지만 가장 큰 문제는 이 다음부터였습니다.

??? : yt-dlp 사용시 전달하는 인자를 코드로 어떻게 구현하지?

yt-dlp --limit-rate 3000K -N 10 --fragment-retries 1000 --retry-sleep fragment:linear=1::2 --force-overwrites "https://www.youtube.com/watch?v=UnPyGbP0WhE&t=403s"

위와 같이 긴 yt-dlp 명령어를 코드로 옮기려면 어떻게 해야 할까요?

뒤에 인자들을 어떻게 yt-dlp 패키지에 맞도록 변환해야 할까요...

--limit rate 3000K 라던지, -N 10 이라던지

# ℹ️ See help(yt_dlp.YoutubeDL) for a list of available options and public functions
ydl_opts = {}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
    info = ydl.extract_info(url, download=False)

일단 확실한 것은 YoutubeDL class 생성 시 넘기는 딕셔너리 (ydl_opts) 에 yt-dlp 다운로드 옵션을 넣는다는 것을 추정해볼 수 있었습니다.

그러나 아쉽게도 기존에 yt-dlp를 사용하면서 넘기던 인자와 ydl_opts에 넣는 인자가 달랐습니다.

옵션값들 이름이 다르더라구요.

예를 들어 yt-dlp를 사용할 때 출력 파일 위치를 지정하는 -o 옵션은

ydl_opts에서는 outtmpl 라는 이름으로 사용합니다.

Class 정의 부분으로 가서 설명을 읽고 일일히 yt-dlp 에서 사용하는 인자와 YoutubeDL 에서 사용하는 인자를 찾아야 하는 노릇이였습니다. 그래서 그냥 사용을 포기하고 프로세스 호출 방식으로 사용하려던 찰나에...

 If you are already familiar with the CLI, you can use devscripts/cli_to_api.py to translate any CLI switches to YoutubeDL params.

Github 공식 페이지를 조금 더 자세히 읽어보니 CLI 로 사용하던 yt-dlp 의 인자를 YoutubeDL Class 에 맞는 인자로 변환시켜주는 코드가 있다고 하네요!

# https://github.com/yt-dlp/yt-dlp/blob/master/devscripts/cli_to_api.py

# Allow direct execution
import yt_dlp.options
import yt_dlp
import os
import sys

sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))


create_parser = yt_dlp.options.create_parser


def parse_patched_options(opts):
    patched_parser = create_parser()
    patched_parser.defaults.update({
        'ignoreerrors': False,
        'retries': 0,
        'fragment_retries': 0,
        'extract_flat': False,
        'concat_playlist': 'never',
    })
    yt_dlp.options.create_parser = lambda: patched_parser
    try:
        return yt_dlp.parse_options(opts)
    finally:
        yt_dlp.options.create_parser = create_parser


default_opts = parse_patched_options([]).ydl_opts


def cli_to_api(opts, cli_defaults=False):
    opts = (yt_dlp.parse_options if cli_defaults else parse_patched_options)(
        opts).ydl_opts

    diff = {k: v for k, v in opts.items() if default_opts[k] != v}
    if 'postprocessors' in diff:
        diff['postprocessors'] = [pp for pp in diff['postprocessors']
                                  if pp not in default_opts['postprocessors']]
    return diff


# 원래 사용한대로 인자를 넘겨주면 yt-dlp class 에서 사용하는 인자로 변환해준다.
# 여기서 인자가 어떻게 변하는지 파악하고 사용하면 된다.
if __name__ == '__main__':
    from pprint import pprint

    print('\n전달된 인수는 다음으로 변환됩니다:\n')
    pprint(cli_to_api(sys.argv[1:]))
    print('\n이를 CLI 기본값과 결합하면 다음으로 표현됩니다:\n')
    pprint(cli_to_api(sys.argv[1:], True))

코드 원문은 위와 같습니다. 제가 한국어로 조금 번역하고 주석만 좀 달아놨어요.

이 코드(변환기)를 활용하면 yt-dlp 에 넘기던 인자를 YoutubeDL class의 옵션 인자로 변환할 수 있습니다.

python yt_dlp_cli_options_converter.py

>>>
전달된 인수는 다음으로 변환됩니다:

{}

이를 CLI 기본값과 결합하면 다음으로 표현됩니다:

{'extract_flat': 'discard_in_playlist',        
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10}

코드를 아무 인자 없이 그냥 실행하면 다음과 같이 출력됩니다

현재 아무 인자 없이 변환기를 실행했으므로

yt-dlp

다음과 같이 그냥 yt-dlp 를 CLI로 아무 것도 없이 실행시켜 사용할 때

{'extract_flat': 'discard_in_playlist',        
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10}

ydl_opts 엔 이런 값이 기본으로 들어가서 작동한다는 뜻 입니다.

yt-dlp --limit-rate 3000K -N 10 --fragment-retries 1000 --retry-sleep fragment:linear=1::2 --force-overwrites "https://www.youtube.com/watch?v=UnPyGbP0WhE&t=403s"

따라서 위 CLI 명령어를 ydl_opts로 변환하려면

python yt_dlp_cli_options_converter.py --limit-rate 3000K -N 10 --fragment-retries 1000 --retry-sleep fragment:linear=1::2 --force-overwrites "https://www.youtube.com/watch?v=UnPyGbP0WhE&t=403s"

전달된 인수는 다음으로 변환됩니다:

{'concurrent_fragment_downloads': 10,
 'continuedl': False,
 'fragment_retries': 1000,
 'overwrites': True,
 'ratelimit': 3072000,
 'retry_sleep_functions': {'fragment': <function validate_options.<locals>.parse_sleep_func.<locals>.<lambda> at 0x0000024A52432A20>}}

이를 CLI 기본값과 결합하면 다음으로 표현됩니다:

{'concurrent_fragment_downloads': 10,
 'continuedl': False,
 'extract_flat': 'discard_in_playlist',
 'fragment_retries': 1000,
 'ignoreerrors': 'only_download',
 'overwrites': True,
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'ratelimit': 3072000,
 'retries': 10,
 'retry_sleep_functions': {'fragment': <function validate_options.<locals>.parse_sleep_func.<locals>.<lambda> at 0x0000024A52453060>}}

다음과 같이 변환기를 실행할 때 인자로 그대로 넘겨주면 됩니다.

기본값은 필요가 없으므로

다운로드 링크를 제외하고 yt-dlp 를 실행시킬 때 준 옵션 --limit-rate 3000K -N 10 --fragment-retries 1000 --retry-sleep fragment:linear=1::2 --force-overwrites 은

{'concurrent_fragment_downloads': 10,
 'continuedl': False,
 'fragment_retries': 1000,
 'overwrites': True,
 'ratelimit': 3072000,
 'retry_sleep_functions': {'fragment': <function validate_options.<locals>.parse_sleep_func.<locals>.<lambda> at 0x0000024A52432A20>}}

라는 딕셔너리 값으로 변환됨을 알 수 있습니다.

이걸 그대로 아까 ydl_opts 값에 넣어주면 CLI로 사용할때와 똑같이 적용됩니다.

(참고로 fragement 뒤에 보이는 복잡한 식의 경우 람다식을 제공해야 하기 때문에 저렇게 복잡하게 나온겁니다.)

설명이 조금 어려워진 감이 있으나...

쉽게 정리하자면

1. yt-dlp 를 기존에 사용하던 사람들은 콘솔창에 yt-dlp 옵션 URL 이런식으로 사용했을 것임

2. 파이썬으로 yt-dlp 는 굳이 프로세스로 실행시킬 필요 없고 yt-dlp 라는 패키지 import 해서 코드로 바로 사용할 수 있음

3. 여기서 yt-dlp 에 준 옵션을 파이썬 yt-dlp 패키지가 이해하도록 알맞게 변환해야 함

4. 변환기 사용해서 잘 변환하고 코드 호출해서 사용하면 됨.

참 쉽죠?

def download_video_high_quality(call_back) -> None:
    # yt-dlp --limit-rate 3000K -N 10 --fragment-retries 1000 --retry-sleep fragment:linear=1::2 --force-overwrites "https://www.youtube.com/watch?v=UnPyGbP0WhE&t=403s"
    with yt_dlp.YoutubeDL(
        {
            # 최고 품질 영상 mp4 & 최고 음질 m4a 로 받으나, 영상의 경우 FHD 이하로 제한한다.
            "format": f"bestvideo[height<=1080][ext={__video_ext}]+bestaudio[ext=m4a]/best[ext={__video_ext}]/best",
            "merge_output_format": __video_ext,
            # "outtmpl": {"default": "%(title)s.%(ext)s"},  # 제목.확장자 형식으로 저장
            "outtmpl": {"default": __video_full_path},
            "throttledratelimit": 102400,
            "fragment_retries": 1000,
            # "overwrites": True,
            "concurrent_fragment_downloads": 3,  # 동시에 N개의 영상 조각을 다운로드
            "retry_sleep_functions": {"fragment": lambda n: n + 1}, # 다운로드 실패시 1초씩 증가시키면서 재시도
            "progress_hooks": [call_back],  # 다운로드 진행 상황을 알려주는 콜백 함수
        }
    ) as ydl:
        ydl.download([self.__url])

예시로 대충 제가 사용중인 다운로드 코드입니다.

최고 품질 mp4, 최고 음질 m4a로 받는데 mp4로 받으면 영상이 1개가 아니라 조각 조각 파편화가 되어 있어서 동시에 3개씩 파편을 받고 너무 연결이 잦아 네트워크 요청 실패 시 1초 간격으로 지연시키면서 다운로드 받는 예제 코드입니다.

progress_hooks 에 call_back 이라는 함수도 있는데

def progress_call_back(data) -> None:
	print(data)
    
download_video_high_quality(progress_call_back)

저기 call_back 부분에 (콜백)함수를 넘겨주면 data 쪽으로 영상 다운로드 상황 및 각종 데이터가 들어와서 사용할 수 있게 됩니다.

아마 자바스크립트로 웹 코딩 해보신 분들은 아주 익숙하실겁니다.. 대충 이런 사용 방법이 있다 정도만 숙지하시고 제일 중요한게 저 위에서 소개해드린 변환코드니깐 잘 활용해서 다운로더 만들어보세요.

저작자표시 비영리 변경금지

'프로그래밍 > Python' 카테고리의 다른 글

[Python] 마우스 & 키보드 못쓰게 잠구기 (0)	2024.07.25
[Python] 스팀 게임 설치 경로 알아내기 - 스팀 VDF / ACF 파일 구조 분석 (0)	2024.01.21
[Python] m4a 음원 파일에 앨범 아트 추가하기 (0)	2023.12.12
[Python] 동영상 파일의 책갈피(Chapter, 챕터) 데이터 읽어오기 - 자막 싱크 조절 (0)	2023.11.11
[Python] 현재 모니터 주사율 가져오기 & 변경 - Windows API 활용 (0)	2023.06.24

yt-dlp란 무엇인가?

Python으로 yt-dlp 호출하기

'프로그래밍 > Python' 카테고리의 다른 글

티스토리툴바