Scraperr – A Self Hosted Webscraper

1 month ago 4

A powerful self-hosted web scraping solution

Scraperr enables you to extract data from websites with precision using XPath selectors. This self-hosted application provides a clean interface to manage scraping jobs, view results, and export data.

📚 Check out the docs for a comprehensive quickstart guide and detailed information.

XPath-Based Extraction: Precisely target page elements
Queue Management: Submit and manage multiple scraping jobs
Domain Spidering: Option to scrape all pages within the same domain
Custom Headers: Add JSON headers to your scraping requests
Media Downloads: Automatically download images, videos, and other media
Results Visualization: View scraped data in a structured table format
Data Export: Export your results in various formats
Notifcation Channels: Send completion notifcations, through various channels

⚖️ Legal and Ethical Guidelines

When using Scraperr, please remember to:

Respect robots.txt: Always check a website's robots.txt file to verify which pages permit scraping
Terms of Service: Adhere to each website's Terms of Service regarding data extraction
Rate Limiting: Implement reasonable delays between requests to avoid overloading servers

Disclaimer: Scraperr is intended for use only on websites that explicitly permit scraping. The creator accepts no responsibility for misuse of this tool.

This project is licensed under the MIT License. See the LICENSE file for details.

Development made easier with the webapp template.

To get started, simply run make build up-dev.

Read Entire Article