Use Cases - Data Science

The World’s News is your Big Data Playground

Big Data that’s ready for analysis - skip the scraping tools!

For data scientists and researchers, the world’s news is more than just a source of information about current affairs – it’s a writhing, ever-evolving data source that’s ripe for mining.

However,  few have the resources and expertise required to build a large-scale crawling, parsing and indexing operation, and instead they are forced to rely on piecemeal solutions, scraping tools, or expensive subscription services to get the data they need.

The Online News Archive changes all of this – now, anyone can access a massively comprehensive database of crawled online news, in machine-readable format. With terabytes of data crawled from millions of news articles available, getting the data you need for your next data science research or product has never been easier.

Instantly-available training datasets for AI, machine learning, NLP

Data has been called the oil of the 21st century, and nowhere is this truer than when it comes to learning algorithms. AI and machine learning systems need large amounts of relevant training data to get better at what they do – but amassing, cleaning and structuring this data can be expensive and distract from your core focus.

The solution? Use the Online News Archive to build custom datasets extracted from thousands of websites, and get all the natural-language content you need, hassle-free. Use granular filters to limit the scope of your dataset based on keywords, entities, publication data or other factors, and ensure you’re getting only the data you need to improve your algorithms.

Structured data, simple REST API

Getting news data from the Online News Archive is incredibly simple and requires no proprietary scripting or coding. Use simple Boolean logic to define your query, set the time frame and voila – you’ve created a structured, clean and production-ready dataset in JSON or XML.

Whether you’re using Python, R, C++ or any other programming language, you can easily integrate the data you receive from the Online News Archive into your existing code – giving you more time and resources to focus on building the next big thing.

Ready to get started?

The Online News Archive lets you build your first dataset in seconds. Getting started is as simple as creating a free account.

Use Cases - Data Science

The World’s News is your Big Data Playground

Big Data that’s ready for analysis - skip the scraping tools!

For data scientists and researchers, the world’s news is more than just a source of information about current affairs – it’s a writhing, ever-evolving data source that’s ripe for mining.

However,  few have the resources and expertise required to build a large-scale crawling, parsing and indexing operation, and instead they are forced to rely on piecemeal solutions, scraping tools, or expensive subscription services to get the data they need.

The Online News Archive changes all of this – now, anyone can access a massively comprehensive database of crawled online news, in machine-readable format. With terabytes of data crawled from millions of news articles available, getting the data you need for your next data science research or product has never been easier.

Instantly-available training datasets for AI, machine learning, NLP

Data has been called the oil of the 21st century, and nowhere is this truer than when it comes to learning algorithms. AI and machine learning systems need large amounts of relevant training data to get better at what they do – but amassing, cleaning and structuring this data can be expensive and distract from your core focus.

The solution? Use the Online News Archive to build custom datasets extracted from thousands of websites, and get all the natural-language content you need, hassle-free. Use granular filters to limit the scope of your dataset based on keywords, entities, publication data or other factors, and ensure you’re getting only the data you need to improve your algorithms.

Structured data, simple REST API

Getting news data from the Online News Archive is incredibly simple and requires no proprietary scripting or coding. Use simple Boolean logic to define your query, set the time frame and voila – you’ve created a structured, clean and production-ready dataset in JSON or XML.

Whether you’re using Python, R, C++ or any other programming language, you can easily integrate the data you receive from the Online News Archive into your existing code – giving you more time and resources to focus on building the next big thing.

Ready to get started?

The Online News Archive lets you build your first dataset in seconds. Getting started is as simple as creating a free account.

Request a Demo

Have one of our data consultants demonstrate how to work with crawled online news

Get Started

Create your free account and access the news archive to easily extract historical news data