惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

P
Proofpoint News Feed
Microsoft Azure Blog
Microsoft Azure Blog
Jina AI
Jina AI
博客园_首页
宝玉的分享
宝玉的分享
The Cloudflare Blog
钛媒体:引领未来商业与生活新知
钛媒体:引领未来商业与生活新知
量子位
T
Tailwind CSS Blog
雷峰网
雷峰网
Blog — PlanetScale
Blog — PlanetScale
Last Week in AI
Last Week in AI
奇客Solidot–传递最新科技情报
奇客Solidot–传递最新科技情报
Hugging Face - Blog
Hugging Face - Blog
月光博客
月光博客
罗磊的独立博客
F
Fortinet All Blogs
酷 壳 – CoolShell
酷 壳 – CoolShell
Stack Overflow Blog
Stack Overflow Blog
J
Java Code Geeks
V
V2EX
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
The GitHub Blog
The GitHub Blog
Apple Machine Learning Research
Apple Machine Learning Research
博客园 - 聂微东
U
Unit 42
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
D
Docker
阮一峰的网络日志
阮一峰的网络日志
I
InfoQ
Simon Willison's Weblog
Simon Willison's Weblog
D
DataBreaches.Net
Threat Intelligence Blog | Flashpoint
Threat Intelligence Blog | Flashpoint
I
Intezer
Scott Helme
Scott Helme
B
Blog
M
MIT News - Artificial intelligence
K
Kaspersky official blog
H
Help Net Security
V
Vulnerabilities – Threatpost
C
CXSECURITY Database RSS Feed - CXSecurity.com
Engineering at Meta
Engineering at Meta
博客园 - 【当耐特】
L
Lohrmann on Cybersecurity
P
Privacy & Cybersecurity Law Blog
Project Zero
Project Zero
The Hacker News
The Hacker News
B
Blog RSS Feed
T
Tor Project blog

SerpApi

Trending Travel Destinations using Python & SerpApi How to scrape Bing reverse image search results Measuring Brand Presence Across AI Answer Engines SerpApi Weekly Changelog: June 15-21, 2026 How I Built a Star Wars Grogu Product Research Agent with Codex, Lark, and SerpApi How to scrape Bing web search results Categorizing hotels using Google Hotels images How to scrape Bing Images search results How to scrape Bing Copilot answers Build A No Code AI-Powered Local Lead Outreach System SerpApi Weekly Changelog: June 08-14, 2026 How to Do SEO Research with Claude Desktop and SerpApi MCP Track and Compare Product Prices Across Stores and Locations (SerpApi & Python) How to scrape Bing Maps search results SerpApi Weekly Changelog: June 01-07, 2026 Amazon ASIN Lookup API: Find and Fetch Product Details The State of MCP: Everything That Changed in H1 2026 How to scrape Bing News search results How to Connect Your Local LLM with Web Search Data SerpApi on Postman: One Unified Collection for Faster API Exploration SerpApi Weekly Changelog: May 25-31, 2026 Building an AI Agent in Python How to scrape Google Case Law API for Legal Research, Analytics, AI, and more SerpApi Achieves SOC 2 Type 2 and ISO 27001 Certification How to find your next product idea with Google Trends and SerpApi Scrape Competitors' Google Ads Data (Tutorial 2026) Using SerpApi and DeepSeek to Break Down Dan Koe’s Content Strategy SerpApi Weekly Changelog: May 18-24, 2026 How to scrape Bing Videos search results How to Scrape Instagram Profile Data with SerpApi How to scrape Bing Shopping search results How to Scrape Google Hotels Reviews SerpApi Weekly Changelog: May 11-17, 2026 5 Things You Can Build with Claude Code and Live Search Data Real Estate Data API for PropTech Developers How to Scrape Apple Maps with SerpApi How to scrape Google Trends
How to Extract Full Opinion Text from Google Scholar Case Law with SerpApi
Nathan Skiles · 2026-06-11 · via SerpApi

SerpApi’s Google Scholar Case Law API returns structured case law data from Google Scholar, including case details and related metadata. For many workflows, that structured response is enough.

However, some use cases require the full opinion body. You may want to store the opinion text locally, make it searchable, review it in a cleaner format, or pass it into another internal workflow.

The full opinion body is available in the raw HTML response. In this tutorial, we’ll extract that opinion body from the HTML, convert it to Markdown, and save it locally using both JavaScript and Python.

If you’re new to the Google Scholar Case Law API, my colleague’s blog on returning the structured response is a helpful starting point, but it’s not required for following this tutorial:

How to scrape Google Case Law API for Legal Research, Analytics, AI, and more

Learn how to retrieve structured case law data from Google Scholar using SerpApi’s Google Scholar Case Law API, including case details, citations, court information, docket numbers, and decision timelines.

SerpApiMichael Moura

Why use SerpApi?

Google Scholar Case Law pages can be scraped manually, but maintaining that workflow reliably can become difficult. At scale, you need to manage proxy infrastructure, CAPTCHA handling, retries, parsing changes, and monitoring for page structure updates.

SerpApi handles the search engine scraping layer and returns results through an API. For Google Scholar Case Law, that means you can retrieve structured case law data from the JSON response, while still having access to the underlying page HTML when you need the full opinion body.

Why the opinion body is in the raw HTML

The Google Scholar Case Law pages include the full legal opinion, but opinion length varies dramatically. Some opinions are relatively short, while others can be prohibitively long.

For that reason, the full opinion body is not returned directly in every JSON response. Including it by default would increase response size and overhead for users who only need metadata like the case title, court, decision date, citations, or related case information.

The full opinion is still available through the raw HTML. In the HTML, the opinion body is contained in the #gs_opinion element:

<div id="gs_opinion">

The sample HTML in the companion repo shows the case opinion inside #gs_opinion, including headings, body text, links, page references, blockquotes, and footnotes.

What we’re building

In these examples, we’ll request the Google Scholar Case Law page directly as HTML. If your workflow also needs structured metadata, you can request the JSON response first and then retrieve the raw HTML file linked in that response. The extraction logic is the same once you have the HTML.

The overall workflow is the same in both examples. We’ll walk through it in JavaScript first, then show the equivalent Python version:

  1. Request the Google Scholar Case Law page from SerpApi as raw HTML.
  2. Parse the HTML.
  3. Select the content within the #gs_opinion element.
  4. Convert the opinion HTML to Markdown.
  5. Save the Markdown file locally.

Requirements

To follow along, you’ll need:

If this is your first time using SerpApi, you can sign up for a free account and use the included 250 monthly searches to test the examples in this tutorial.

JavaScript example

Let’s start with the JavaScript version. This example requests the Google Scholar Case Law page as HTML, extracts the opinion body, converts it to Markdown, and saves the result locally.

The full JavaScript example is available in the /javascript directory of the companion repository.

scraping_case_law_body/javascript at main · NateSkiles/scraping_case_law_body

Contribute to NateSkiles/scraping_case_law_body development by creating an account on GitHub.

GitHubNateSkiles

Install dependencies

From the javascript directory, install the required packages:

npm install

The dependencies are already listed in the example project’s package.json. This example uses:

  • serpapi to fetch the raw HTML response from SerpApi.
  • cheerio to parse the HTML and select the opinion body with familiar CSS-style selectors.
  • turndown to convert the opinion HTML into Markdown.
  • dotenv to load your SerpApi API key from a local .env file.

Fetch the raw HTML

We import getHtml from the serpapi package and define the Google Scholar Case Law case_id we want to retrieve:

const { getHtml } = require("serpapi");
require("dotenv").config();

const caseId = "9174924986185145879";

Now we can request the page HTML. The parameters are fairly straightforward:

  • engine - Set to google_scholar_case_law, the API we are requesting data from.
  • api_key - Your SerpApi API key, loaded from the environment variable.
  • case_id - The Google Scholar Case Law case ID for the opinion we want to extract.
getHtml(
  {
    api_key: process.env.SERPAPI_KEY,
    engine: "google_scholar_case_law",
    case_id: caseId,
  },
  (html) => {
    // We'll parse the HTML in the next step.
  }
);

This returns the Google Scholar Case Law page as HTML, which we can then parse and extract the opinion body from.

Parse the opinion body

Once the HTML is returned, load it with Cheerio:

const $ = cheerio.load(html);

Cheerio lets us query the HTML with CSS-style selectors. Since the opinion body is contained in the #gs_opinion element, we can select that element and get its inner HTML:

const opinionHtml = $("#gs_opinion").html();

It’s also worth handling the case where the opinion body is not found:

if (!opinionHtml) {
  console.error("Could not find case opinion in the search results.");
  return;
}

At this point, opinionHtml contains the HTML for the opinion body, including paragraphs, headings, links, blockquotes, page references, and footnotes.

Convert the opinion to Markdown

Next, create a new Turndown service and pass the opinion HTML to turndown():

const turndownService = new TurndownService({
  headingStyle: "atx",
  codeBlockStyle: "fenced",
  strongDelimiter: "**",
  emDelimiter: "*",
  linkStyle: "inlined",
});

const markdown = turndownService.turndown(opinionHtml);

Turndown is a good fit here because we are not trying to manually scrape each paragraph, heading, link, or blockquote. At this point, we already have the opinion body as HTML. We just want to preserve the readable structure in a more portable text format.

Markdown works well for that because it keeps the output readable while still preserving useful formatting like headings, links, paragraphs, blockquotes, bold text, and italics.

Save the Markdown file

Finally, create an output filename and write the Markdown to the shared output directory:

const isoDate = new Date().toISOString().split("T")[0];
const outputPath = path.join(outputDir, `js_${caseId}_${isoDate}.md`);

fs.mkdirSync(outputDir, { recursive: true });
fs.writeFileSync(outputPath, markdown, "utf8");

console.log(`Saved case opinion to ${outputPath}`);

You can certainly tune Turndown further based on your use case and formatting needs. Here's some example output from the JavaScript example using Turndown:

**937 S.W.2d 444 (1996)**

### CONTINENTAL COFFEE PRODUCTS CO. and Allen D. Duff, Petitioners,  
v.  
Juanita CAZAREZ, Respondent.

[No. 95-0827.](/scholar?scidkt=15876565480693424878&as_sdt=2&hl=en)

**Supreme Court of Texas.**

Argued February 14, 1996.

Decided December 13, 1996.

Rehearing Overruled February 21, 1997.

[445](#p445)[\*445](#p445) A. Martin Wickliff, Jr., Barbara L. Johnson, Paul E. Hash, Houston, for Respondent.
...

Python example

The Python version follows the same overall workflow: request the page as HTML, parse the returned HTML, select the #gs_opinion element, convert it to Markdown, and save the result locally.

The full Python example is available in the /python directory of the companion repository.

scraping_case_law_body/python at main · NateSkiles/scraping_case_law_body

Contribute to NateSkiles/scraping_case_law_body development by creating an account on GitHub.

GitHubNateSkiles

Install dependencies

From the python directory, install the required packages:

pip install -r requirements.txt

This example uses:

  • serpapi to fetch the raw HTML response from SerpApi.
  • beautifulsoup4 to parse the HTML and select the opinion body.
  • markdownify to convert the opinion HTML into Markdown.
  • python-dotenv to load your SerpApi API key from a local .env file.

Fetch the raw HTML

First, load the API key from your .env file and make sure it exists:

load_dotenv()

api_key = os.getenv("SERPAPI_KEY")

if not api_key:
    raise RuntimeError("SERPAPI_KEY is required.")

Then request the Google Scholar Case Law page as HTML:

html = serpapi.search(
    api_key=api_key,
    engine="google_scholar_case_law",
    case_id=CASE_ID,
    output="html",
)

The output="html" parameter tells SerpApi to return the Google Scholar Case Law page as HTML instead of JSON.

Parse the opinion body

Next, parse the HTML with Beautiful Soup and select the #gs_opinion element:

soup = BeautifulSoup(html, "html.parser")
opinion = soup.select_one("#gs_opinion")

As in the JavaScript example, it’s worth handling the case where the opinion body is not found:

if not opinion:
    raise ValueError("Could not find case opinion in the search results.")

Convert and save as Markdown

Once we have the opinion body, we can convert it to Markdown with markdownify:

markdown = markdownify(
    str(opinion),
    heading_style="ATX",
    bullets="-",
).strip()

Then create the output directory, generate a filename using the case ID and current date, and write the Markdown file:

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)

iso_date = date.today().isoformat()
output_path = OUTPUT_DIR / f"py_{CASE_ID}_{iso_date}.md"

output_path.write_text(markdown, encoding="utf-8")

print(f"Saved case opinion to {output_path}")

The Python output follows the same structure as the JavaScript example, with Markdown headings, links, page references, and opinion text preserved.

Caveats and edge cases

This workflow is intentionally simple, but there are a few things to keep in mind.

First, the script should fail clearly if the #gs_opinion element is not found. If Google Scholar changes the page structure, or if a specific result does not include an opinion body, you do not want the script to silently save an empty file.

Second, HTML-to-Markdown libraries may handle links, anchors, spacing, and nested elements differently. The output should be reviewed before using it in production workflows, especially if you need to preserve legal citations or page references exactly.

Finally, Google Scholar opinions may include page markers, footnotes, blockquotes, citation links, and other formatting from the original opinion page. Depending on your use case, you may want to preserve those elements, clean them from the final Markdown, or customize the conversion rules further.

View the full code on GitHub

The full JavaScript and Python examples are available in the companion repository:

GitHub - NateSkiles/scraping_case_law_body

Contribute to NateSkiles/scraping_case_law_body development by creating an account on GitHub.

GitHubNateSkiles

Each folder includes its own setup instructions and writes generated Markdown files to the repo’s root-level output directory.

Conclusion

SerpApi’s Google Scholar Case Law API gives you structured case law data, while the raw HTML provides access to the full opinion body when needed. By selecting the #gs_opinion element, you can extract the complete opinion and convert it into Markdown for storage, analysis, search indexing, or internal research workflows.

You can use the companion repository to run the JavaScript or Python example locally, then adapt the parsing and Markdown conversion steps for your own case law workflows.

For more info, you can also check out: