Where is the "food" for Big Data
Chema Sept. 29, 2018
big data dataset ckan open data transparency
Certainly, today there are many frameworks and algorithms to process large amounts of data, Big data.
But if you are starting, you will soon meet the harsh reality: Where do I get the data from?
Not data, not big data
If you are new in the amazing world of big data, you probably have read a great post/tutorial on Medium or explore a interesting GitHub repository.
Nowadays it's very easy to start learning about Big Data.
First, because we often talk about Big Data when we discuss Business Intelligence. Sometimes we are even referring to some SQL statements that are more elaborate than a simple "select". Marketing rules!
And it has a direct implication, you can perform the processing in a Docker container on your laptop.
So, when do we really talk about Big Data? Well, there is not a widely adapted definition. For us the answer is: when your BI software or your server is not able to process the information in a reasonable time. But it's not a today's topic.
Whether BI or Big data, the problem you have is where to get the data to practice.
Open data portals
Thanks to open data initiatives and transparency laws, many governments and companies are releasing as open data a multitude of datasets.
We have analyzed the major sources of open data on the Internet and we have elaborated the following top-ten list:
-
OpenStreetMap
Our favourite dataset. A 964 GB XML of geolocated data from all over the planet (41.4GB of PBF-compressed). POIs, Businesses, shops, roads, public transport. You can download all the globe, or just the region of interest. Download page -
Data.gov
More than 300,000 datasets await you here. Its data is collected by the US government on a multitude of different topics: health, transport, demography, tourism, etc. - FIWARE datasets
There are not many dataset, around 3.000, but if you are interested in data coming from IoT and located in the main European and Spanish cities, here you can find gold.
- US Census Bureau
Data on the United States census. They also have free software for the development of queries and data management.
- EU Open Data Portal
With more than 12.000 datasets, it is the Europe version of "data.gov". But the truth is that the dataset are not curated.
- Data.gov.uk
Open dataset from the UK Government, including the British National Bibliography.
- Open Data Network
Developed by Socrata, it includes a search engine that answers questions directly. But the information is mainly from the United States.
- Amazon Web Services public datasets
There are lots of interesting dataset, like multi-band images taken by satellites.
- Google Public data explorer
It includes a search bar and the datasets are location aware.
- The CIA World Factbook
Compiles dataset of different topic from more than 250 countries. Very interesting to exploit world information.
Now, there is no excuse!
With the data sources presented here, you have no excuse to enter the fascinating world of "Big Data".
Do not miss anything!
Subscribe to our mailing list and stay informed