Ferdowsi
/

pytube

Model card Files Files and versions Community

Taylor Fox Dahlin commited on Nov 18, 2020

Commit

cbc4903

unverified ·

1 Parent(s): 7ebd828

Playlist update (#819)

Browse files

* Removed deprecated functions
* Added playlist documentation.

Files changed (9) hide show

README.md +3 -6
docs/api.rst +7 -0
docs/index.rst +1 -0
docs/user/playlist.rst +46 -0
pytube/cli.py +1 -1
pytube/contrib/playlist.py +20 -134
pytube/extract.py +21 -4
setup.py +1 -0
tests/contrib/test_playlist.py +5 -12

README.md CHANGED Viewed

@@ -149,17 +149,14 @@ Conversely, if you only want to see the DASH streams (also referred to as "adapt
  <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
 ```
-You can also download a complete Youtube playlist:
 ```python
 >>> from pytube import Playlist
 >>> pl = Playlist("https://www.youtube.com/watch?v=Edpy1szoG80&list=PL153hDY-y1E00uQtCVCVC8xJ25TYX8yPU")
->>> pl.download_all()
->>> # or if you want to download in a specific directory
->>> pl.download_all('/path/to/directory/')
 ```
-This will download the highest progressive stream available (generally 720p) from the given playlist. Later more options would be given for user's flexibility
-to choose video resolution.
 Pytube allows you to filter on every property available (see the documentation for the complete list), let's take a look at some of the most useful ones.

  <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
 ```
+You can also interact with Youtube playlists:
 ```python
 >>> from pytube import Playlist
 >>> pl = Playlist("https://www.youtube.com/watch?v=Edpy1szoG80&list=PL153hDY-y1E00uQtCVCVC8xJ25TYX8yPU")
+>>> for video in pl.videos:
+>>>     video.streams.first().download()
 ```
 Pytube allows you to filter on every property available (see the documentation for the complete list), let's take a look at some of the most useful ones.

docs/api.rst CHANGED Viewed

@@ -13,6 +13,13 @@ YouTube Object
    :members:
    :inherited-members:
 Stream Object
 -------------

    :members:
    :inherited-members:
+Playlist Object
+---------------
+.. autoclass:: pytube.contrib.playlist.Playlist
+   :members:
+   :inherited-members:
 Stream Object
 -------------

docs/index.rst CHANGED Viewed

@@ -63,6 +63,7 @@ This part of the documentation begins with some background information about the
    user/install
    user/quickstart
 The API Documentation / Guide
 -----------------------------

    user/install
    user/quickstart
+   user/playlist
 The API Documentation / Guide
 -----------------------------

docs/user/playlist.rst ADDED Viewed

	@@ -0,0 +1,46 @@

+.. _install:
+Using Playlists
+===============
+This guide will walk you through the basics of working with pytube Playlists.
+Creating a Playlist
+-------------------
+Using pytube to interact with playlists is very simple.
+Begin by importing the Playlist class::
+    >>> from pytube import Playlist
+Now let's create a playlist object. You can do this by initializing the object with a playlist URL::
+    >>> p = Playlist('https://www.youtube.com/playlist?list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n')
+Or you can create one from a video link in a playlist::
+    >>> p = Playlist('https://www.youtube.com/watch?v=41qgdwd3zAg&list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n')
+Now, we have a :class:`Playlist <pytube.Playlist>` object called ``p`` that we can do some work with.
+Interacting with a playlist
+---------------------------
+Fundamentally, a Playlist object is just a container for YouTube objects.
+If, for example, we wanted to download all of the videos in a playlist, we would do the following::
+    >>> print(f'Downloading: {p.title}')
+    Downloading: Python Tutorial for Beginers (For Absolute Beginners)
+    >>> for video in p.videos:
+    >>>     video.streams.first().download()
+Or, if we're only interested in the URLs for the videos, we can look at those as well::
+    >>> for url in p.video_urls[:3]:
+    >>>     print(url)
+    Python Tutorial for Beginers 1 - Getting Started and Installing Python (For Absolute Beginners)
+    Python Tutorial for Beginers 2 - Numbers and Math in Python
+    Python Tutorial for Beginers 3 - Variables and Inputs
+And that's basically all there is to it! The Playlist class is relatively straightforward.

pytube/cli.py CHANGED Viewed

@@ -40,7 +40,7 @@ def main():
         print("Loading playlist...")
         playlist = Playlist(args.url)
         if not args.target:
-            args.target = safe_filename(playlist.title())
         for youtube_video in playlist.videos:
             try:
                 _perform_args_on_youtube(youtube_video, args)

         print("Loading playlist...")
         playlist = Playlist(args.url)
         if not args.target:
+            args.target = safe_filename(playlist.title)
         for youtube_video in playlist.videos:
             try:
                 _perform_args_on_youtube(youtube_video, args)

pytube/contrib/playlist.py CHANGED Viewed

@@ -11,63 +11,32 @@ from typing import Iterable
 from typing import List
 from typing import Optional
 from typing import Union
-from urllib.parse import parse_qs
 from pytube import request
 from pytube import YouTube
 from pytube.helpers import cache
-from pytube.helpers import deprecated
 from pytube.helpers import install_proxy
 from pytube.helpers import uniqueify
 logger = logging.getLogger(__name__)
 class Playlist(Sequence):
-    """Load a YouTube playlist with URL or ID"""
     def __init__(self, url: str, proxies: Optional[Dict[str, str]] = None):
         if proxies:
             install_proxy(proxies)
-        try:
-            self.playlist_id: str = parse_qs(url.split("?")[1])["list"][0]
-        except IndexError:  # assume that url is just the id
-            self.playlist_id = url
         self.playlist_url = (
             f"https://www.youtube.com/playlist?list={self.playlist_id}"
         )
         self.html = request.get(self.playlist_url)
-        # Needs testing with non-English
-        self.last_update: Optional[date] = None
-        date_match = re.search(
-            r"Last updated on (\w{3}) (\d{1,2}), (\d{4})", self.html
-        )
-        if date_match:
-            month, day, year = date_match.groups()
-            self.last_update = datetime.strptime(
-                f"{month} {day:0>2} {year}", "%b %d %Y"
-            ).date()
-        self._js_regex = re.compile(r"window\[\"ytInitialData\"] = ([^\n]+)")
-        self._video_regex = re.compile(r"href=\"(/watch\?v=[\w-]*)")
-    @deprecated(
-        "This function will be removed in the future, please use .video_urls"
-    )
-    def parse_links(self) -> List[str]:  # pragma: no cover
-        """ Deprecated function for returning list of URLs
-        :return: List[str]
-        """
-        return self.video_urls
-    def _extract_json(self, html: str) -> str:
-        return self._js_regex.search(html).group(1)[0:-1]
     def _paginate(
         self, until_watch_id: Optional[str] = None
     ) -> Iterable[List[str]]:
@@ -82,9 +51,7 @@ class Playlist(Sequence):
         """
         req = self.html
         videos_urls, continuation = self._extract_videos(
-            # extract the json located inside the window["ytInitialData"] js
-            # variable of the playlist html page
-            self._extract_json(req)
         )
         if until_watch_id:
             try:
@@ -263,96 +230,20 @@ class Playlist(Sequence):
     def __repr__(self) -> str:
         return f"{self.video_urls}"
-    @deprecated(
-        "This call is unnecessary, you can directly access .video_urls or "
-        ".videos"
-    )
-    def populate_video_urls(self) -> List[str]:  # pragma: no cover
-        """Complete links of all the videos in playlist
-        :rtype: List[str]
-        :returns: List of video URLs
-        """
-        return self.video_urls
-    @deprecated("This function will be removed in the future.")
-    def _path_num_prefix_generator(self, reverse=False):  # pragma: no cover
-        """Generate number prefixes for the items in the playlist.
-        If the number of digits required to name a file,is less than is
-        required to name the last file,it prepends 0s.
-        So if you have a playlist of 100 videos it will number them like:
-        001, 002, 003 ect, up to 100.
-        It also adds a space after the number.
-        :return: prefix string generator : generator
-        """
-        digits = len(str(len(self.video_urls)))
-        if reverse:
-            start, stop, step = (len(self.video_urls), 0, -1)
-        else:
-            start, stop, step = (1, len(self.video_urls) + 1, 1)
-        return (str(i).zfill(digits) for i in range(start, stop, step))
-    @deprecated(
-        "This function will be removed in the future. Please iterate through "
-        ".videos"
-    )
-    def download_all(
-        self,
-        download_path: Optional[str] = None,
-        prefix_number: bool = True,
-        reverse_numbering: bool = False,
-        resolution: str = "720p",
-    ) -> None:  # pragma: no cover
-        """Download all the videos in the the playlist.
-        :param download_path:
-            (optional) Output path for the playlist If one is not
-            specified, defaults to the current working directory.
-            This is passed along to the Stream objects.
-        :type download_path: str or None
-        :param prefix_number:
-            (optional) Automatically numbers playlists using the
-            _path_num_prefix_generator function.
-        :type prefix_number: bool
-        :param reverse_numbering:
-            (optional) Lets you number playlists in reverse, since some
-            playlists are ordered newest -> oldest.
-        :type reverse_numbering: bool
-        :param resolution:
-            Video resolution i.e. "720p", "480p", "360p", "240p", "144p"
-        :type resolution: str
-        :rtype: List[str]
-        :returns:
-            List of filepaths for downloaded videos.
-        """
-        logger.debug("total videos found: %d", len(self.video_urls))
-        logger.debug("starting download")
-        prefix_gen = self._path_num_prefix_generator(reverse_numbering)
-        downloaded_filepaths = []
-        for link in self.video_urls:
-            youtube = YouTube(link)
-            dl_stream = (
-                youtube.streams.get_by_resolution(resolution=resolution)
-                or youtube.streams.get_lowest_resolution()
-            )
-            assert dl_stream is not None
-            logger.debug("download path: %s", download_path)
-            if prefix_number:
-                prefix = next(prefix_gen)
-                logger.debug("file prefix is: %s", prefix)
-                dl_path = dl_stream.download(download_path, filename_prefix=prefix)
-            else:
-                dl_path = dl_stream.download(download_path)
-            downloaded_filepaths.append(dl_path)
-            logger.debug("download complete")
-        return downloaded_filepaths
     @cache
     def title(self) -> Optional[str]:
         """Extract playlist title
@@ -360,13 +251,8 @@ class Playlist(Sequence):
         :return: playlist title (name)
         :rtype: Optional[str]
         """
-        pattern = re.compile("<title>(.+?)</title>")
-        match = pattern.search(self.html)
-        if match is None:
-            return None
-        return match.group(1).replace("- YouTube", "").strip()
     @staticmethod
     def _video_url(watch_path: str):

 from typing import List
 from typing import Optional
 from typing import Union
+from pytube import extract
 from pytube import request
 from pytube import YouTube
 from pytube.helpers import cache
 from pytube.helpers import install_proxy
+from pytube.helpers import regex_search
 from pytube.helpers import uniqueify
 logger = logging.getLogger(__name__)
 class Playlist(Sequence):
+    """Load a YouTube playlist with URL"""
     def __init__(self, url: str, proxies: Optional[Dict[str, str]] = None):
         if proxies:
             install_proxy(proxies)
+        self.playlist_id = extract.playlist_id(url)
         self.playlist_url = (
             f"https://www.youtube.com/playlist?list={self.playlist_id}"
         )
         self.html = request.get(self.playlist_url)
     def _paginate(
         self, until_watch_id: Optional[str] = None
     ) -> Iterable[List[str]]:
         """
         req = self.html
         videos_urls, continuation = self._extract_videos(
+            extract.initial_data(self.html)
         )
         if until_watch_id:
             try:
     def __repr__(self) -> str:
         return f"{self.video_urls}"
+    @property
+    @cache
+    def last_updated(self) -> Optional[date]:
+        date_match = re.search(
+            r"Last updated on (\w{3}) (\d{1,2}), (\d{4})", self.html
+        )
+        if date_match:
+            month, day, year = date_match.groups()
+            return datetime.strptime(
+                f"{month} {day:0>2} {year}", "%b %d %Y"
+            ).date()
+        return None
+    @property
     @cache
     def title(self) -> Optional[str]:
         """Extract playlist title
         :return: playlist title (name)
         :rtype: Optional[str]
         """
+        pattern = r"<title>(.+?)</title>"
+        return regex_search(pattern, self.html, 1).replace("- YouTube", "").strip()
     @staticmethod
     def _video_url(watch_path: str):

pytube/extract.py CHANGED Viewed

@@ -2,6 +2,7 @@
 """This module contains all non-cipher related data extraction logic."""
 import json
 import logging
 import re
 from collections import OrderedDict
 from datetime import datetime
@@ -115,6 +116,24 @@ def video_id(url: str) -> str:
     return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
 def video_info_url(video_id: str, watch_url: str) -> str:
     """Construct the video_info url.
@@ -412,13 +431,11 @@ def initial_data(watch_html: str) -> str:
     @param watch_html: Html of the watch page
     @return:
     """
-    initial_data_pattern = r"window\[['\"]ytInitialData['\"]]\s*=\s*([^\n]+)"
     try:
-        match = regex_search(initial_data_pattern, watch_html, 1)
     except RegexMatchError:
         return "{}"
-    else:
-        return match[:-1]
 def metadata(initial_data) -> Optional[YouTubeMetadata]:

 """This module contains all non-cipher related data extraction logic."""
 import json
 import logging
+import urllib.parse
 import re
 from collections import OrderedDict
 from datetime import datetime
     return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
+def playlist_id(url: str) -> str:
+    """Extract the ``playlist_id`` from a YouTube url.
+    This function supports the following patterns:
+    - :samp:`https://youtube.com/playlist?list={playlist_id}`
+    - :samp:`https://youtube.com/watch?v={video_id}&list={playlist_id}`
+    :param str url:
+        A YouTube url containing a playlist id.
+    :rtype: str
+    :returns:
+        YouTube playlist id.
+    """
+    parsed = urllib.parse.urlparse(url)
+    return parse_qs(parsed.query)['list'][0]
 def video_info_url(video_id: str, watch_url: str) -> str:
     """Construct the video_info url.
     @param watch_html: Html of the watch page
     @return:
     """
+    initial_data_pattern = r"window\[['\"]ytInitialData['\"]]\s*=\s*([^\n]+);"
     try:
+        return regex_search(initial_data_pattern, watch_html, 1)
     except RegexMatchError:
         return "{}"
 def metadata(initial_data) -> Optional[YouTubeMetadata]:

setup.py CHANGED Viewed

@@ -40,6 +40,7 @@ setup(
         "Programming Language :: Python :: 3.6",
         "Programming Language :: Python :: 3.7",
         "Programming Language :: Python :: 3.8",
         "Programming Language :: Python",
         "Topic :: Internet",
         "Topic :: Multimedia :: Video",

         "Programming Language :: Python :: 3.6",
         "Programming Language :: Python :: 3.7",
         "Programming Language :: Python :: 3.8",
+        "Programming Language :: Python :: 3.9",
         "Programming Language :: Python",
         "Topic :: Internet",
         "Topic :: Multimedia :: Video",

tests/contrib/test_playlist.py CHANGED Viewed

@@ -17,7 +17,7 @@ def test_title(request_get):
         "-ghgoSH6n"
     )
     pl = Playlist(url)
-    pl_title = pl.title()
     assert (
         pl_title
         == "(149) Python Tutorial for Beginners (For Absolute Beginners)"
@@ -50,21 +50,14 @@ def test_init_with_watch_url(request_get):
 @mock.patch("pytube.contrib.playlist.request.get")
-def test_last_update(request_get, playlist_html):
     expected = datetime.date(2020, 3, 11)
     request_get.return_value = playlist_html
-    playlist = Playlist("url")
-    assert playlist.last_update == expected
-@mock.patch("pytube.contrib.playlist.request.get")
-def test_init_with_watch_id(request_get):
-    request_get.return_value = ""
-    playlist = Playlist("PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n")
-    assert (
-        playlist.playlist_url == "https://www.youtube.com/playlist?list"
         "=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n"
     )
 @mock.patch("pytube.contrib.playlist.request.get")

         "-ghgoSH6n"
     )
     pl = Playlist(url)
+    pl_title = pl.title
     assert (
         pl_title
         == "(149) Python Tutorial for Beginners (For Absolute Beginners)"
 @mock.patch("pytube.contrib.playlist.request.get")
+def test_last_updated(request_get, playlist_html):
     expected = datetime.date(2020, 3, 11)
     request_get.return_value = playlist_html
+    playlist = Playlist(
+        "https://www.youtube.com/playlist?list"
         "=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n"
     )
+    assert playlist.last_updated == expected
 @mock.patch("pytube.contrib.playlist.request.get")