Where is the "food" for Big Data

Chema Sept. 29, 2018

big data dataset ckan open data transparency

Certainly, today there are many frameworks and algorithms to process large amounts of data, Big data.

But if you are starting, you will soon meet the harsh reality: Where do I get the data from?

Not data, not big data

If you are new in the amazing world of big data, you probably have read a great post/tutorial on Medium or explore a interesting GitHub repository. 

Nowadays it's very easy to start learning about Big Data. 

First, because we often talk about Big Data when we discuss Business Intelligence. Sometimes we are even referring to some SQL statements that are more elaborate than a simple "select". Marketing rules!

And it has a direct implication, you can perform the processing in a Docker container on your laptop.

So, when do we really talk about Big Data? Well, there is not a widely adapted definition. For us the answer is: when your BI software or your server is not able to process the information in a reasonable time. But it's not a today's topic. 

Whether BI or Big data, the problem you have is where to get the data to practice.

Open data portals

Thanks to open data initiatives and transparency laws, many governments and companies are releasing as open data a multitude of datasets.

We have analyzed the major sources of open data on the Internet and we have elaborated the following top-ten list:

  1. OpenStreetMap
    Our favourite dataset. A 964 GB XML of geolocated data from all over the planet (41.4GB of PBF-compressed). POIs, Businesses, shops, roads, public transport. You can download all the globe, or just the region of interest. Download page

  2. Data.gov
    More than 300,000 datasets await you here. Its data is collected by the US government on a multitude of different topics: health, transport, demography, tourism, etc.

  3. FIWARE datasets
    There are not many dataset, around 3.000, but if you are interested in data coming from IoT and located in the main European and Spanish cities, here you can find gold.
     
  4. US Census Bureau
    Data on the United States census. They also have free software for the development of queries and data management.
     
  5. EU Open Data Portal
    With more than 12.000 datasets, it is the Europe version of "data.gov". But the truth is that the dataset are not curated.
     
  6. Data.gov.uk
    Open dataset from the UK Government, including the British National Bibliography.
     
  7. Open Data Network
    Developed by Socrata, it includes a search engine that answers questions directly. But the information is mainly from the United States.
     
  8. Amazon Web Services public datasets
    There are lots of interesting dataset, like multi-band images taken by satellites.
     
  9. Google Public data explorer
    It includes a search bar and the datasets are location aware.
     
  10. The CIA World Factbook
    Compiles dataset of different topic from more than 250 countries. Very interesting to exploit world information.

Now, there is no excuse!

With the data sources presented here, you have no excuse to enter the fascinating world of "Big Data".

Do not miss anything!

Subscribe to our mailing list and stay informed

Accept the terms & privacy conditions

Subscription done! Thanks so much

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.
 

I agree See cookies

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.