Quickly extract information related to a certain webpage, including the text content, by using a minimalist app that exports the data to JSON or CSV
WebScraper offers you the possibility to quickly extract content from an online source with minimal effort. You have full control over the data that will be exported to the CSV or JSON files.
Quickly scan any website by using multiple threads
Within the WebScraper main window you must specify the URL address of the webpage you want to scan, and the number of threads that are to be used to complete the procedure. You get to adjust the latter parameter with the help of a simple slider bar.
To avoid any unnecessary scanning, you can choose to crawl only a single page, and then start the process with a simple mouse click. In the Live View window, you get to see the status message returned by each link, which might prove useful when dealing with debugging tasks.
Extract various types of information and export the data to CSV or JSON
In the WebScraper Output panel, you get to choose the type of information you want the utility to extract from a web page: the URL, the title, the description, content associated with a different class or ID, the headings, the page content in various formats (plain text, HTML or Markdown) and the last modified date.
You also get to choose the output file format (CSV or JSON), decide to consolidate white spaces, and set an alert if the file exceeds a certain size. If you are opting for the CSV format, you get to pick when to use quotes around columns, what to adopt instead of quotes, or the line separator type.
Last but not least, WebScraper also allows you to change the user-agent, to set a limit for the number of links and the clicks from home, can ignore query strings, and may treat subdomains of root domain as internal pages.
Effortlessly crawl information from online sources without too much user interaction
WebScraper offers you the possibility to quickly scan websites and output their content, together with other additional medatada, to CSV of JSON files. The tool is great whenever you want to have offline access top the data without having to store the entire page.
- Quick and easy to scan a site
- Plenty of extraction options including various metadata, content like (such as text, html or markdown) HTML elements with certain classes / identifiers, regular expression
- Easily export: Select the columns you want
- csv or JSON output as
- New option to generate a single text file (designed to archive text content, rebates or plain text)
- lots of options / settings
- setting up multiple tracking and limits on the size of the output file
Compatibility OS X 10.8 or later 64-bit
Web Site: http://peacockmedia.co.uk/
What’s New in WebScraper 4.4.0
- Adds ‘crawl above starting directory’ control (below blacklist / whitelist table on Scan tab). This is useful in cases where you want to start at a deep url, but to collect data from linked pages which aren’t necessarily within the starting directory. You will then probably want to limit your scan using ‘crawl maximum links from home’ or blacklisting / whitelisting.