How to Scrape Organic Video Results from Brave Search with Python

#Python  
0_fbWzXDD9MU7D7B4C.jpeg

A guide to scraping the title, link, displayed link, video thumbnail, and video duration from Brave Search Organic Video results.

What is Brave Search

For the sake of non-duplicating content, I already wrote about what is Brave search in my first Brave blog post.

Intro

This blog post is a continuation of the Brave Search web scraping series. Here you’ll see how to scrape Organic Video Results from Brave Search using Python with beautifulsoup, requests, lxml libraries.

Note: HTML layout might be changed in the future thus some ofCSSselectors might not work.Let me knowif something isn't working.

Prerequisites

pip install requests
pip install lxml 
pip install beautifulsoup4

Make sure you have a basic knowledge of the libraries mentioned above, since this blog post is not exactly a tutorial for beginners, so be sure you have a basic familiarity with them. I’ll try my best to show in code that it’s not that difficult.

Also, make sure you have a basic understanding of CSS selectors because of select()/ select_one()beautifulsoup methods that accepts CSS selectors. CSS selectors reference.

Imports

from bs4 import BeautifulSoup
import requests, lxml, json

What will be scraped

0_8IitatRPCIBYwap8.png

Not only 3 video results will be scraped, but 6 instead (if you click on the right arrow button), which is all in this case.

Process

Continuing to wander through Dune, let’s scrape Organic Video results about Dune.

Code is identical to scraping Brave Search News results, except we need to add video duration data and remove website source data from the output

As in the previous post, we need to find a container with needed data:

0_W2hGMNHi4DSWDkQI.png

This translates to this (onlyidvalue is changed from#news-carouselto#video-carousel):

for video_result in soup.select('#video-carousel .card'):
    # further code..

After picking a container, we need to grab other elements, such as title, link, displayed link, video thumbnail and video duration with appropriate CSS selectors.

0_f6WCCRsNIWRc0d9Q.gif

Code

rom bs4 import BeautifulSoup
import requests, lxml, json

headers = {
  'User-agent':
  "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.102 Safari/537.36 Edge/18.19582"
}

params = {
  'q': 'dune 2021',
  'source': 'web'
}

def get_organic_video_results():

  html = requests.get('https://search.brave.com/search', headers=headers, params=params)
  soup = BeautifulSoup(html.text, 'lxml')

  data = []

  for video_result in soup.select('#video-carousel .card'):
    title = video_result.select_one('.title').text.strip()
    link = video_result['href']
    source = video_result.select_one('.anchor').text.strip()
    favicon = video_result.select_one('.favicon')['src']
    thumbnail = video_result.select_one('.img-bg')['style'].split(', ')[0].replace("background-image: url('", "").replace("')", "")
    try:
      video_duration = video_result.select_one('.duration').text.strip()
    except: video_duration = None

    data.append({
      'title': title,
      'link': link,
      'source': source,
      'favicon': favicon,
      'thumbnail': thumbnail,
      'video_duration': video_duration
    })

  print(json.dumps(data, indent=2, ensure_ascii=False))


get_organic_video_results()

---------------
'''
[
# first result
 {
    "title": "Dune | Official Main Trailer - YouTube",
    "link": "https://www.youtube.com/watch?v=8g18jFHCLXk",
    "source": "youtube.com",
    "favicon": "https://imgr.search.brave.com/_l2jz03v6ptkaRq7BbdclpMEfo0AtVjCzta7SCwUTL0/fit/32/32/ce/1/aHR0cDovL2Zhdmlj/b25zLnNlYXJjaC5i/cmF2ZS5jb20vaWNv/bnMvOTkyZTZiMWU3/YzU3Nzc5YjExYzUy/N2VhZTIxOWNlYjM5/ZGVjN2MyZDY4Nzdh/ZDYzMTYxNmI5N2Rk/Y2Q3N2FkNy93d3cu/eW91dHViZS5jb20v",
    "thumbnail": "https://imgr.search.brave.com/-Ut-yfD45SCozeHmuatVUuDNJcTB3_JBS2pRhNylInw/fit/200/200/ce/1/aHR0cHM6Ly9pLnl0/aW1nLmNvbS92aS84/ZzE4akZIQ0xYay9t/YXhyZXNkZWZhdWx0/LmpwZw",
    "duration": "03:28"
  },
# last result
  {
    "title": "Dune (2021) Future Fashion Featurette - YouTube",
    "link": "https://www.youtube.com/watch?v=0SzLFIdpmbw",
    "source": "youtube.com",
    "source_website_icon": "https://imgr.search.brave.com/_l2jz03v6ptkaRq7BbdclpMEfo0AtVjCzta7SCwUTL0/fit/32/32/ce/1/aHR0cDovL2Zhdmlj/b25zLnNlYXJjaC5i/cmF2ZS5jb20vaWNv/bnMvOTkyZTZiMWU3/YzU3Nzc5YjExYzUy/N2VhZTIxOWNlYjM5/ZGVjN2MyZDY4Nzdh/ZDYzMTYxNmI5N2Rk/Y2Q3N2FkNy93d3cu/eW91dHViZS5jb20v",
    "thumbnail": "https://imgr.search.brave.com/fA0LnkpZ-0eQi3PcH0oidTJKC0H-ULoYuAUsVcYpcaU/fit/200/200/ce/1/aHR0cHM6Ly9pLnl0/aW1nLmNvbS92aS8w/U3pMRklkcG1idy9t/YXhyZXNkZWZhdWx0/LmpwZw",
    "video_duration": "02:54"
  }
]
'''

Links

Code in the online IDESelectorGadget

Conclusion

If you have any questions or suggestions, or something isn’t working correctly, feel free to drop a comment in the comment section.

If you want to access that feature via SerpApi, upvote on the Support Brave Search feature request, which is currently under review.

Yours,
Dimitry, and the rest of the SerpApi Team.

Start blogging about your favorite technologies and get more readers

Join other developers and claim your FAUN account now!

Avatar

Dimitry Zub

Developer Advocate, SerpApi

@dimitryzub
Developer Advocate (Contract) at @serpapi. Web scraping, ETL, data using Python, Ruby.
19

Authority

107

Total Hits

Discussed tools
Python