Descriptions for WebScraper 4.1.0
Developer: Shiela Dixon
Mac Platform: Intel
OS Version: OS X 10.8 or later
Processor type(s) & speed: 64-bit processor
Includes: Pre-K’ed (TNT)
Web Site: http://peacockmedia.co.uk/
Quickly extract information related to a certain webpage, including the text content, by using a minimalist app that exports the data to JSON or CSV
WebScraper offers you the possibility to quickly extract content from an online source with minimal effort. You have full control over the data that will be exported to the CSV or JSON files.
Quickly scan any website by using multiple threads
Within the WebScraper main window you must specify the URL address of the webpage you want to scan, and the number of threads that are to be used to complete the procedure. You get to adjust the latter parameter with the help of a simple slider bar.
To avoid any unnecessary scanning, you can choose to crawl only a single page, and then start the process with a simple mouse click. In the Live View window, you get to see the status message returned by each link, which might prove useful when dealing with debugging tasks.
Extract various types of information and export the data to CSV or JSON
In the WebScraper Output panel, you get to choose the type of information you want the utility to extract from a web page: the URL, the title, the description, content associated with a different class or ID, the headings, the page content in various formats (plain text, HTML or Markdown) and the last modified date.
You also get to choose the output file format (CSV or JSON), decide to consolidate white spaces, and set an alert if the file exceeds a certain size. If you are opting for the CSV format, you get to pick when to use quotes around columns, what to adopt instead of quotes, or the line separator type.
Last but not least, WebScraper also allows you to change the user-agent, to set a limit for the number of links and the clicks from home, can ignore query strings, and may treat subdomains of root domain as internal pages.
Effortlessly crawl information from online sources without too much user interaction
WebScraper offers you the possibility to quickly scan websites and output their content, together with other additional medatada, to CSV of JSON files. The tool is great whenever you want to have offline access top the data without having to store the entire page.
What’s new in WebScraper 4.1.0
April 19th, 2018
- Adds capability of downloading images to a folder during the scan. See Complex setup > Output file columns > Also download images to folder. Images can optionally be downloaded only if they match a pattern, either partial url or regex match. (leave box blank to download all images discovered)
- Adds option to filter output file – ie only include data in output file from certain pages (eg information pages or product pages). This is done by matching the url of the page being scraped, either by partial url (eg /product/) or a regex match
- Fixes issue with saving project. (note that saving project does not save data, only settings and configuration. Save data separately using Export from the Results screen or File > Export)