Почему мой API транскриптов YouTube работает только в непроизводственной версии, но не в производственной?

В моей непроизводственной среде я могу использовать API транскрипции YouTube для получения транскрипции.

В моей производственной среде после долгой отладки и регистрации я не могу этого сделать. Вот журналы:

Я знаю, что мой code в порядке, так как он работает в непроизводственной среде.

Согласно журналам и тому факту, что мой API openAI работает, это не может быть проблемой сети.

Помимо решения этой проблемы, мне очень интересно, почему это так?

Отладка / регистрация

Проверка сетевых настроек

Примечание, там говорится, что субтитры отключены для этого видео, однако я могу подтвердить, что это не так — похоже, это общее сообщение об ошибке, которое выдается для любого видео.

2024-08-20T07:41:29.989747260Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:29 +0000] "GET /generate/youtubeSummary/ HTTP/1.1" 200 30723 "https://[ANONYMIZED_DOMAIN]/dashboard/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"

2024-08-20T07:41:45.777775814Z Custom form options: {}
2024-08-20T07:41:45.778110014Z Form data debug: {'grade_level': '', 'video_url': 'https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]', 'summary_length': None}
2024-08-20T07:41:45.778131714Z INFO 2024-08-20 07:41:45,777 views Generating summary for video URL: https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]
2024-08-20T07:41:45.781101714Z INFO 2024-08-20 07:41:45,780 views YouTube IP address: [ANONYMIZED_IP]
2024-08-20T07:41:45.979793308Z INFO 2024-08-20 07:41:45,979 views YouTube connection status: 200
2024-08-20T07:41:45.980433708Z INFO 2024-08-20 07:41:45,980 views Attempting to connect to: www.youtube.com
2024-08-20T07:41:45.980820208Z INFO 2024-08-20 07:41:45,980 views Extracted video ID: [ANONYMIZED_VIDEO_ID]
2024-08-20T07:41:46.463787194Z INFO 2024-08-20 07:41:46,463 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'
2024-08-20T07:41:46.463807494Z Request method: 'POST'
2024-08-20T07:41:46.463815194Z Request headers:
2024-08-20T07:41:46.463822194Z     'Content-Type': 'application/json'
2024-08-20T07:41:46.463830194Z     'Content-Length': '2373'
2024-08-20T07:41:46.463840094Z     'Accept': 'application/json'
2024-08-20T07:41:46.463847194Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'
2024-08-20T07:41:46.463853694Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'
2024-08-20T07:41:46.463863894Z A body is sent with the request
2024-08-20T07:41:46.485539393Z INFO 2024-08-20 07:41:46,485 _universal Response status: 200
2024-08-20T07:41:46.485558093Z Response headers:
2024-08-20T07:41:46.485565793Z     'Transfer-Encoding': 'chunked'
2024-08-20T07:41:46.485600593Z     'Content-Type': 'application/json; charset=utf-8'
2024-08-20T07:41:46.485609693Z     'Server': 'Microsoft-HTTPAPI/2.0'
2024-08-20T07:41:46.485616293Z     'Strict-Transport-Security': 'REDACTED'
2024-08-20T07:41:46.485622793Z     'X-Content-Type-Options': 'REDACTED'
2024-08-20T07:41:46.485629293Z     'Date': 'Tue, 20 Aug 2024 07:41:45 GMT'
2024-08-20T07:41:46.486316593Z INFO 2024-08-20 07:41:46,485 _base Transmission succeeded: Item received: 2. Items accepted: 2
2024-08-20T07:41:46.515040992Z ERROR 2024-08-20 07:41:46,513 views Error generating YouTube summary: 
2024-08-20T07:41:46.515060292Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:
2024-08-20T07:41:46.515068192Z 
2024-08-20T07:41:46.515203692Z Subtitles are disabled for this video
2024-08-20T07:41:46.515214292Z 
2024-08-20T07:41:46.515348292Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
2024-08-20T07:41:46.515375792Z Traceback (most recent call last):
2024-08-20T07:41:46.515384092Z   File "/tmp/[ANONYMIZED_PATH]/theDashboard/views.py", line 2002, in generate_youtube_summary
2024-08-20T07:41:46.515390492Z     transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
2024-08-20T07:41:46.515396292Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_api.py", line 71, in list_transcripts
2024-08-20T07:41:46.515401992Z     return TranscriptListFetcher(http_client).fetch(video_id)
2024-08-20T07:41:46.515407392Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 48, in fetch
2024-08-20T07:41:46.515413192Z     self._extract_captions_json(self._fetch_video_html(video_id), video_id),
2024-08-20T07:41:46.515418692Z   File "/tmp/[ANONYMIZED_PATH]/antenv/lib/python3.9/site-packages/youtube_transcript_api/_transcripts.py", line 62, in _extract_captions_json
2024-08-20T07:41:46.515424292Z     raise TranscriptsDisabled(video_id)
2024-08-20T07:41:46.515429592Z youtube_transcript_api._errors.TranscriptsDisabled: 
2024-08-20T07:41:46.515434992Z Could not retrieve a transcript for the video https://www.youtube.com/watch?v=[ANONYMIZED_VIDEO_ID]! This is most likely caused by:
2024-08-20T07:41:46.515440592Z 
2024-08-20T07:41:46.515446092Z Subtitles are disabled for this video
2024-08-20T07:41:46.515451692Z 
2024-08-20T07:41:46.515457192Z If you are sure that the described cause is not responsible for this error and that a transcript should be retrievable, please create an issue at https://github.com/jdepoix/youtube-transcript-api/issues. Please add which version of youtube_transcript_api you are using and provide the information needed to replicate the error. Also make sure that there are no open issues which already describe your problem!
2024-08-20T07:41:46.517604192Z [ANONYMIZED_IP] - - [20/Aug/2024:07:41:46 +0000] "POST /generate/youtubeSummary/ HTTP/1.1" 200 789 "https://[ANONYMIZED_DOMAIN]/generate/youtubeSummary/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:129.0) Gecko/20100101 Firefox/129.0"

2024-08-20T07:41:51.462730956Z INFO 2024-08-20 07:41:51,462 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'
2024-08-20T07:41:51.462753956Z Request method: 'POST'
2024-08-20T07:41:51.462761756Z Request headers:
2024-08-20T07:41:51.462838355Z     'Content-Type': 'application/json'
2024-08-20T07:41:51.462846755Z     'Content-Length': '1124'
2024-08-20T07:41:51.462853355Z     'Accept': 'application/json'
2024-08-20T07:41:51.462870155Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'
2024-08-20T07:41:51.462877055Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'
2024-08-20T07:41:51.462883755Z A body is sent with the request
2024-08-20T07:41:51.467871320Z INFO 2024-08-20 07:41:51,467 _universal Request URL: 'https://[ANONYMIZED_DOMAIN]/v2.1/track'
2024-08-20T07:41:51.467899520Z Request method: 'POST'
2024-08-20T07:41:51.467909320Z Request headers:
2024-08-20T07:41:51.467952720Z     'Content-Type': 'application/json'
2024-08-20T07:41:51.467962219Z     'Content-Length': '2397'
2024-08-20T07:41:51.467968519Z     'Accept': 'application/json'
2024-08-20T07:41:51.467974719Z     'x-ms-client-request-id': '[ANONYMIZED_REQUEST_ID]'
2024-08-20T07:41:51.467981319Z     'User-Agent': 'azsdk-python-azuremonitorclient/unknown Python/3.9.19 (Linux-5.15.158.2-1.cm2-x86_64-with-glibc2.28)'
2024-08-20T07:41:51.467987919Z A body is sent with the request
2024-08-20T07:41:51.472131390Z INFO 2024-08-20 07:41:51,471 _universal Response status: 200
2024-08-20T07:41:51.472146690Z Response headers:
2024-08-20T07:41:51.472154290Z     'Transfer-Encoding': 'chunked'
2024-08-20T07:41:51.472160590Z     'Content-Type': 'application/json; charset=utf-8'
2024-08-20T07:41:51.472167190Z     'Server': 'Microsoft-HTTPAPI/2.0'
2024-08-20T07:41:51.472195890Z     'Strict-Transport-Security': 'REDACTED'
2024-08-20T07:41:51.472204590Z     'X-Content-Type-Options': 'REDACTED'
2024-08-20T07:41:51.472210890Z     'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'
2024-08-20T07:41:51.472633987Z INFO 2024-08-20 07:41:51,472 _base Transmission succeeded: Item received: 2. Items accepted: 2
2024-08-20T07:41:51.479943736Z INFO 2024-08-20 07:41:51,479 _universal Response status: 200
2024-08-20T07:41:51.479965136Z Response headers:
2024-08-20T07:41:51.479973036Z     'Transfer-Encoding': 'chunked'
2024-08-20T07:41:51.479980236Z     'Content-Type': 'application/json; charset=utf-8'
2024-08-20T07:41:51.479987236Z     'Server': 'Microsoft-HTTPAPI/2.0'
2024-08-20T07:41:51.479995336Z     'Strict-Transport-Security': 'REDACTED'
2024-08-20T07:41:51.480004735Z     'X-Content-Type-Options': 'REDACTED'
2024-08-20T07:41:51.480012935Z     'Date': 'Tue, 20 Aug 2024 07:41:50 GMT'
2024-08-20T07:41:51.480649231Z INFO 2024-08-20 07:41:51,480 _base Transmission succeeded: Item received: 1. Items accepted: 1

2024-08-20T07:43:41  No new trace in the past 1 min(s).
2024-08-20T07:44:41  No new trace in the past 2 min(s).
def generate_youtube_summary(video_url, custom_form_options=None):
    logger.info(f"Generating summary for video URL: {video_url}")

    connectivity_results = test_youtube_connectivity()
    if not (connectivity_results["dns_resolution"] and connectivity_results["connection_status"]):
        logger.error("YouTube connectivity check failed. Details: %s", connectivity_results)
        return "Unable to connect to YouTube. Please check your internet connection and try again."

    logger.info(f"Attempting to connect to: {urlparse(video_url).netloc}")

    video_id = None
    if 'youtu.be/' in video_url:
        video_id = video_url.split('youtu.be/')[1]
    elif 'youtube.com/watch?v=' in video_url:
        video_id = video_url.split('v=')[1]
    elif 'youtube.com/embed/' in video_url:
        video_id = video_url.split('embed/')[1]

    logger.info(f"Extracted video ID: {video_id}")

    if video_id:
        try:
            transcript_list = YouTubeTranscriptApi.list_transcripts(video_id)
            logger.info(f"Retrieved transcript list for video ID: {video_id}")

            transcript = transcript_list.find_transcript(['en'])
            logger.info("Found English transcript")

            transcript_data = transcript.fetch()
            logger.info("Fetched transcript data")

            transcript_text = ' '.join([entry['text'] for entry in transcript_data])
            logger.info(f"Extracted transcript text (first 100 chars): {transcript_text[:100]}...")

            summary_prompt = f"""
            <role>YouTube Video Summarizer</role> """
            logger.info("Sending summary prompt to process_text function")

            summary = process_text(summary_prompt)
            logger.info(f"Received summary from process_text (first 100 chars): {summary[:100]}...")

            return summary
        except Exception as e:
            logger.error(f"Error generating YouTube summary: {str(e)}", exc_info=True)
            if "TranscriptsDisabled" in str(e):
                return "Unable to generate summary. Subtitles are disabled for this video."
            elif "NoTranscriptFound" in str(e):
                return "No transcript found for this video. It may not have subtitles available."
            else:
                return f"Failed to generate video summary. Error: {str(e)}"
    else:
        logger.warning(f"Invalid YouTube video URL: {video_url}")
        return "Invalid YouTube video URL. Please provide a valid URL."
Ульян
Вопрос задан12 апреля 2024 г.

1 Ответ

Ваш ответ

Загрузить файл.