Taylor Fox Dahlin commited on
Commit
cbc4903
·
unverified ·
1 Parent(s): 7ebd828

Playlist update (#819)

Browse files

* Removed deprecated functions
* Added playlist documentation.

README.md CHANGED
@@ -149,17 +149,14 @@ Conversely, if you only want to see the DASH streams (also referred to as "adapt
149
  <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
150
  ```
151
 
152
- You can also download a complete Youtube playlist:
153
 
154
  ```python
155
  >>> from pytube import Playlist
156
  >>> pl = Playlist("https://www.youtube.com/watch?v=Edpy1szoG80&list=PL153hDY-y1E00uQtCVCVC8xJ25TYX8yPU")
157
- >>> pl.download_all()
158
- >>> # or if you want to download in a specific directory
159
- >>> pl.download_all('/path/to/directory/')
160
  ```
161
- This will download the highest progressive stream available (generally 720p) from the given playlist. Later more options would be given for user's flexibility
162
- to choose video resolution.
163
 
164
  Pytube allows you to filter on every property available (see the documentation for the complete list), let's take a look at some of the most useful ones.
165
 
 
149
  <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]
150
  ```
151
 
152
+ You can also interact with Youtube playlists:
153
 
154
  ```python
155
  >>> from pytube import Playlist
156
  >>> pl = Playlist("https://www.youtube.com/watch?v=Edpy1szoG80&list=PL153hDY-y1E00uQtCVCVC8xJ25TYX8yPU")
157
+ >>> for video in pl.videos:
158
+ >>> video.streams.first().download()
 
159
  ```
 
 
160
 
161
  Pytube allows you to filter on every property available (see the documentation for the complete list), let's take a look at some of the most useful ones.
162
 
docs/api.rst CHANGED
@@ -13,6 +13,13 @@ YouTube Object
13
  :members:
14
  :inherited-members:
15
 
 
 
 
 
 
 
 
16
  Stream Object
17
  -------------
18
 
 
13
  :members:
14
  :inherited-members:
15
 
16
+ Playlist Object
17
+ ---------------
18
+
19
+ .. autoclass:: pytube.contrib.playlist.Playlist
20
+ :members:
21
+ :inherited-members:
22
+
23
  Stream Object
24
  -------------
25
 
docs/index.rst CHANGED
@@ -63,6 +63,7 @@ This part of the documentation begins with some background information about the
63
 
64
  user/install
65
  user/quickstart
 
66
 
67
  The API Documentation / Guide
68
  -----------------------------
 
63
 
64
  user/install
65
  user/quickstart
66
+ user/playlist
67
 
68
  The API Documentation / Guide
69
  -----------------------------
docs/user/playlist.rst ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .. _install:
2
+
3
+ Using Playlists
4
+ ===============
5
+
6
+ This guide will walk you through the basics of working with pytube Playlists.
7
+
8
+ Creating a Playlist
9
+ -------------------
10
+
11
+ Using pytube to interact with playlists is very simple.
12
+ Begin by importing the Playlist class::
13
+
14
+ >>> from pytube import Playlist
15
+
16
+ Now let's create a playlist object. You can do this by initializing the object with a playlist URL::
17
+
18
+ >>> p = Playlist('https://www.youtube.com/playlist?list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n')
19
+
20
+ Or you can create one from a video link in a playlist::
21
+
22
+ >>> p = Playlist('https://www.youtube.com/watch?v=41qgdwd3zAg&list=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n')
23
+
24
+ Now, we have a :class:`Playlist <pytube.Playlist>` object called ``p`` that we can do some work with.
25
+
26
+ Interacting with a playlist
27
+ ---------------------------
28
+
29
+ Fundamentally, a Playlist object is just a container for YouTube objects.
30
+
31
+ If, for example, we wanted to download all of the videos in a playlist, we would do the following::
32
+
33
+ >>> print(f'Downloading: {p.title}')
34
+ Downloading: Python Tutorial for Beginers (For Absolute Beginners)
35
+ >>> for video in p.videos:
36
+ >>> video.streams.first().download()
37
+
38
+ Or, if we're only interested in the URLs for the videos, we can look at those as well::
39
+
40
+ >>> for url in p.video_urls[:3]:
41
+ >>> print(url)
42
+ Python Tutorial for Beginers 1 - Getting Started and Installing Python (For Absolute Beginners)
43
+ Python Tutorial for Beginers 2 - Numbers and Math in Python
44
+ Python Tutorial for Beginers 3 - Variables and Inputs
45
+
46
+ And that's basically all there is to it! The Playlist class is relatively straightforward.
pytube/cli.py CHANGED
@@ -40,7 +40,7 @@ def main():
40
  print("Loading playlist...")
41
  playlist = Playlist(args.url)
42
  if not args.target:
43
- args.target = safe_filename(playlist.title())
44
  for youtube_video in playlist.videos:
45
  try:
46
  _perform_args_on_youtube(youtube_video, args)
 
40
  print("Loading playlist...")
41
  playlist = Playlist(args.url)
42
  if not args.target:
43
+ args.target = safe_filename(playlist.title)
44
  for youtube_video in playlist.videos:
45
  try:
46
  _perform_args_on_youtube(youtube_video, args)
pytube/contrib/playlist.py CHANGED
@@ -11,63 +11,32 @@ from typing import Iterable
11
  from typing import List
12
  from typing import Optional
13
  from typing import Union
14
- from urllib.parse import parse_qs
15
 
 
16
  from pytube import request
17
  from pytube import YouTube
18
  from pytube.helpers import cache
19
- from pytube.helpers import deprecated
20
  from pytube.helpers import install_proxy
 
21
  from pytube.helpers import uniqueify
22
 
23
  logger = logging.getLogger(__name__)
24
 
25
 
26
  class Playlist(Sequence):
27
- """Load a YouTube playlist with URL or ID"""
28
 
29
  def __init__(self, url: str, proxies: Optional[Dict[str, str]] = None):
30
  if proxies:
31
  install_proxy(proxies)
32
 
33
- try:
34
- self.playlist_id: str = parse_qs(url.split("?")[1])["list"][0]
35
- except IndexError: # assume that url is just the id
36
- self.playlist_id = url
37
 
38
  self.playlist_url = (
39
  f"https://www.youtube.com/playlist?list={self.playlist_id}"
40
  )
41
  self.html = request.get(self.playlist_url)
42
 
43
- # Needs testing with non-English
44
- self.last_update: Optional[date] = None
45
- date_match = re.search(
46
- r"Last updated on (\w{3}) (\d{1,2}), (\d{4})", self.html
47
- )
48
- if date_match:
49
- month, day, year = date_match.groups()
50
- self.last_update = datetime.strptime(
51
- f"{month} {day:0>2} {year}", "%b %d %Y"
52
- ).date()
53
-
54
- self._js_regex = re.compile(r"window\[\"ytInitialData\"] = ([^\n]+)")
55
-
56
- self._video_regex = re.compile(r"href=\"(/watch\?v=[\w-]*)")
57
-
58
- @deprecated(
59
- "This function will be removed in the future, please use .video_urls"
60
- )
61
- def parse_links(self) -> List[str]: # pragma: no cover
62
- """ Deprecated function for returning list of URLs
63
-
64
- :return: List[str]
65
- """
66
- return self.video_urls
67
-
68
- def _extract_json(self, html: str) -> str:
69
- return self._js_regex.search(html).group(1)[0:-1]
70
-
71
  def _paginate(
72
  self, until_watch_id: Optional[str] = None
73
  ) -> Iterable[List[str]]:
@@ -82,9 +51,7 @@ class Playlist(Sequence):
82
  """
83
  req = self.html
84
  videos_urls, continuation = self._extract_videos(
85
- # extract the json located inside the window["ytInitialData"] js
86
- # variable of the playlist html page
87
- self._extract_json(req)
88
  )
89
  if until_watch_id:
90
  try:
@@ -263,96 +230,20 @@ class Playlist(Sequence):
263
  def __repr__(self) -> str:
264
  return f"{self.video_urls}"
265
 
266
- @deprecated(
267
- "This call is unnecessary, you can directly access .video_urls or "
268
- ".videos"
269
- )
270
- def populate_video_urls(self) -> List[str]: # pragma: no cover
271
- """Complete links of all the videos in playlist
272
-
273
- :rtype: List[str]
274
- :returns: List of video URLs
275
- """
276
- return self.video_urls
277
-
278
- @deprecated("This function will be removed in the future.")
279
- def _path_num_prefix_generator(self, reverse=False): # pragma: no cover
280
- """Generate number prefixes for the items in the playlist.
281
-
282
- If the number of digits required to name a file,is less than is
283
- required to name the last file,it prepends 0s.
284
- So if you have a playlist of 100 videos it will number them like:
285
- 001, 002, 003 ect, up to 100.
286
- It also adds a space after the number.
287
- :return: prefix string generator : generator
288
- """
289
- digits = len(str(len(self.video_urls)))
290
- if reverse:
291
- start, stop, step = (len(self.video_urls), 0, -1)
292
- else:
293
- start, stop, step = (1, len(self.video_urls) + 1, 1)
294
- return (str(i).zfill(digits) for i in range(start, stop, step))
295
-
296
- @deprecated(
297
- "This function will be removed in the future. Please iterate through "
298
- ".videos"
299
- )
300
- def download_all(
301
- self,
302
- download_path: Optional[str] = None,
303
- prefix_number: bool = True,
304
- reverse_numbering: bool = False,
305
- resolution: str = "720p",
306
- ) -> None: # pragma: no cover
307
- """Download all the videos in the the playlist.
308
-
309
- :param download_path:
310
- (optional) Output path for the playlist If one is not
311
- specified, defaults to the current working directory.
312
- This is passed along to the Stream objects.
313
- :type download_path: str or None
314
- :param prefix_number:
315
- (optional) Automatically numbers playlists using the
316
- _path_num_prefix_generator function.
317
- :type prefix_number: bool
318
- :param reverse_numbering:
319
- (optional) Lets you number playlists in reverse, since some
320
- playlists are ordered newest -> oldest.
321
- :type reverse_numbering: bool
322
- :param resolution:
323
- Video resolution i.e. "720p", "480p", "360p", "240p", "144p"
324
- :type resolution: str
325
- :rtype: List[str]
326
- :returns:
327
- List of filepaths for downloaded videos.
328
- """
329
- logger.debug("total videos found: %d", len(self.video_urls))
330
- logger.debug("starting download")
331
-
332
- prefix_gen = self._path_num_prefix_generator(reverse_numbering)
333
-
334
- downloaded_filepaths = []
335
-
336
- for link in self.video_urls:
337
- youtube = YouTube(link)
338
- dl_stream = (
339
- youtube.streams.get_by_resolution(resolution=resolution)
340
- or youtube.streams.get_lowest_resolution()
341
- )
342
- assert dl_stream is not None
343
-
344
- logger.debug("download path: %s", download_path)
345
- if prefix_number:
346
- prefix = next(prefix_gen)
347
- logger.debug("file prefix is: %s", prefix)
348
- dl_path = dl_stream.download(download_path, filename_prefix=prefix)
349
- else:
350
- dl_path = dl_stream.download(download_path)
351
- downloaded_filepaths.append(dl_path)
352
- logger.debug("download complete")
353
-
354
- return downloaded_filepaths
355
 
 
356
  @cache
357
  def title(self) -> Optional[str]:
358
  """Extract playlist title
@@ -360,13 +251,8 @@ class Playlist(Sequence):
360
  :return: playlist title (name)
361
  :rtype: Optional[str]
362
  """
363
- pattern = re.compile("<title>(.+?)</title>")
364
- match = pattern.search(self.html)
365
-
366
- if match is None:
367
- return None
368
-
369
- return match.group(1).replace("- YouTube", "").strip()
370
 
371
  @staticmethod
372
  def _video_url(watch_path: str):
 
11
  from typing import List
12
  from typing import Optional
13
  from typing import Union
 
14
 
15
+ from pytube import extract
16
  from pytube import request
17
  from pytube import YouTube
18
  from pytube.helpers import cache
 
19
  from pytube.helpers import install_proxy
20
+ from pytube.helpers import regex_search
21
  from pytube.helpers import uniqueify
22
 
23
  logger = logging.getLogger(__name__)
24
 
25
 
26
  class Playlist(Sequence):
27
+ """Load a YouTube playlist with URL"""
28
 
29
  def __init__(self, url: str, proxies: Optional[Dict[str, str]] = None):
30
  if proxies:
31
  install_proxy(proxies)
32
 
33
+ self.playlist_id = extract.playlist_id(url)
 
 
 
34
 
35
  self.playlist_url = (
36
  f"https://www.youtube.com/playlist?list={self.playlist_id}"
37
  )
38
  self.html = request.get(self.playlist_url)
39
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
40
  def _paginate(
41
  self, until_watch_id: Optional[str] = None
42
  ) -> Iterable[List[str]]:
 
51
  """
52
  req = self.html
53
  videos_urls, continuation = self._extract_videos(
54
+ extract.initial_data(self.html)
 
 
55
  )
56
  if until_watch_id:
57
  try:
 
230
  def __repr__(self) -> str:
231
  return f"{self.video_urls}"
232
 
233
+ @property
234
+ @cache
235
+ def last_updated(self) -> Optional[date]:
236
+ date_match = re.search(
237
+ r"Last updated on (\w{3}) (\d{1,2}), (\d{4})", self.html
238
+ )
239
+ if date_match:
240
+ month, day, year = date_match.groups()
241
+ return datetime.strptime(
242
+ f"{month} {day:0>2} {year}", "%b %d %Y"
243
+ ).date()
244
+ return None
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
245
 
246
+ @property
247
  @cache
248
  def title(self) -> Optional[str]:
249
  """Extract playlist title
 
251
  :return: playlist title (name)
252
  :rtype: Optional[str]
253
  """
254
+ pattern = r"<title>(.+?)</title>"
255
+ return regex_search(pattern, self.html, 1).replace("- YouTube", "").strip()
 
 
 
 
 
256
 
257
  @staticmethod
258
  def _video_url(watch_path: str):
pytube/extract.py CHANGED
@@ -2,6 +2,7 @@
2
  """This module contains all non-cipher related data extraction logic."""
3
  import json
4
  import logging
 
5
  import re
6
  from collections import OrderedDict
7
  from datetime import datetime
@@ -115,6 +116,24 @@ def video_id(url: str) -> str:
115
  return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
116
 
117
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
118
  def video_info_url(video_id: str, watch_url: str) -> str:
119
  """Construct the video_info url.
120
 
@@ -412,13 +431,11 @@ def initial_data(watch_html: str) -> str:
412
  @param watch_html: Html of the watch page
413
  @return:
414
  """
415
- initial_data_pattern = r"window\[['\"]ytInitialData['\"]]\s*=\s*([^\n]+)"
416
  try:
417
- match = regex_search(initial_data_pattern, watch_html, 1)
418
  except RegexMatchError:
419
  return "{}"
420
- else:
421
- return match[:-1]
422
 
423
 
424
  def metadata(initial_data) -> Optional[YouTubeMetadata]:
 
2
  """This module contains all non-cipher related data extraction logic."""
3
  import json
4
  import logging
5
+ import urllib.parse
6
  import re
7
  from collections import OrderedDict
8
  from datetime import datetime
 
116
  return regex_search(r"(?:v=|\/)([0-9A-Za-z_-]{11}).*", url, group=1)
117
 
118
 
119
+ def playlist_id(url: str) -> str:
120
+ """Extract the ``playlist_id`` from a YouTube url.
121
+
122
+ This function supports the following patterns:
123
+
124
+ - :samp:`https://youtube.com/playlist?list={playlist_id}`
125
+ - :samp:`https://youtube.com/watch?v={video_id}&list={playlist_id}`
126
+
127
+ :param str url:
128
+ A YouTube url containing a playlist id.
129
+ :rtype: str
130
+ :returns:
131
+ YouTube playlist id.
132
+ """
133
+ parsed = urllib.parse.urlparse(url)
134
+ return parse_qs(parsed.query)['list'][0]
135
+
136
+
137
  def video_info_url(video_id: str, watch_url: str) -> str:
138
  """Construct the video_info url.
139
 
 
431
  @param watch_html: Html of the watch page
432
  @return:
433
  """
434
+ initial_data_pattern = r"window\[['\"]ytInitialData['\"]]\s*=\s*([^\n]+);"
435
  try:
436
+ return regex_search(initial_data_pattern, watch_html, 1)
437
  except RegexMatchError:
438
  return "{}"
 
 
439
 
440
 
441
  def metadata(initial_data) -> Optional[YouTubeMetadata]:
setup.py CHANGED
@@ -40,6 +40,7 @@ setup(
40
  "Programming Language :: Python :: 3.6",
41
  "Programming Language :: Python :: 3.7",
42
  "Programming Language :: Python :: 3.8",
 
43
  "Programming Language :: Python",
44
  "Topic :: Internet",
45
  "Topic :: Multimedia :: Video",
 
40
  "Programming Language :: Python :: 3.6",
41
  "Programming Language :: Python :: 3.7",
42
  "Programming Language :: Python :: 3.8",
43
+ "Programming Language :: Python :: 3.9",
44
  "Programming Language :: Python",
45
  "Topic :: Internet",
46
  "Topic :: Multimedia :: Video",
tests/contrib/test_playlist.py CHANGED
@@ -17,7 +17,7 @@ def test_title(request_get):
17
  "-ghgoSH6n"
18
  )
19
  pl = Playlist(url)
20
- pl_title = pl.title()
21
  assert (
22
  pl_title
23
  == "(149) Python Tutorial for Beginners (For Absolute Beginners)"
@@ -50,21 +50,14 @@ def test_init_with_watch_url(request_get):
50
 
51
 
52
  @mock.patch("pytube.contrib.playlist.request.get")
53
- def test_last_update(request_get, playlist_html):
54
  expected = datetime.date(2020, 3, 11)
55
  request_get.return_value = playlist_html
56
- playlist = Playlist("url")
57
- assert playlist.last_update == expected
58
-
59
-
60
- @mock.patch("pytube.contrib.playlist.request.get")
61
- def test_init_with_watch_id(request_get):
62
- request_get.return_value = ""
63
- playlist = Playlist("PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n")
64
- assert (
65
- playlist.playlist_url == "https://www.youtube.com/playlist?list"
66
  "=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n"
67
  )
 
68
 
69
 
70
  @mock.patch("pytube.contrib.playlist.request.get")
 
17
  "-ghgoSH6n"
18
  )
19
  pl = Playlist(url)
20
+ pl_title = pl.title
21
  assert (
22
  pl_title
23
  == "(149) Python Tutorial for Beginners (For Absolute Beginners)"
 
50
 
51
 
52
  @mock.patch("pytube.contrib.playlist.request.get")
53
+ def test_last_updated(request_get, playlist_html):
54
  expected = datetime.date(2020, 3, 11)
55
  request_get.return_value = playlist_html
56
+ playlist = Playlist(
57
+ "https://www.youtube.com/playlist?list"
 
 
 
 
 
 
 
 
58
  "=PLS1QulWo1RIaJECMeUT4LFwJ-ghgoSH6n"
59
  )
60
+ assert playlist.last_updated == expected
61
 
62
 
63
  @mock.patch("pytube.contrib.playlist.request.get")