Big Data - Harisfazillah Jamel - Startup and Developer 4th Meetup 5th November 2016
- 1. Big Data Harisfazillah Jamel Startup and Developer 4th Meetup 5th November 2016
- 2. Why Big Data? Big Data is not only for big player Big Data is also for Us. Startup and developers Data is raw gold. Information about us is the end product. Data define us. Web Server log, web page analytic and comments about or products.
- 3. What Is Big Data? Big data is a term for data sets that are so large or complex that traditional data processing applications are inadequate to deal with them. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization, querying, updating and information privacy. (Wikipedia) Lets redefine big data for us.
- 4. What Is Big Data? Volume . Variety . Velocity . Veracity ● Very big data ● Multiple sources ● Stream in data ● Accuracy of the data
- 5. Redefine Big Data For Startup 4 important terms :- ● Data Sets ● Data Processing ● Analytic ● Visualization Big Data is big. We need to focus
- 6. What Should We Call Our Big Data? ● Small Data ● Startup Data ● No Data We need to visualize our data since day 0 It’s a must
- 7. Why Big Data? Big data analytics examines large amounts of data to uncover hidden patterns, correlations and other insights. (SAS) We need to know our own insight. Visualize our future.
- 8. Data Sets We don’t have any data (No data) or lack of data - Hendak cari data kita cari data Our own data or We have a place to start. www.data.gov.my
- 9. Data Set : Our Own Data? ● Web server log ○ IP address of the visitors. IP2Country ● Web access analysis ○ Most visited pages ● Comments from our users. ○ Good, bad, Like, Dislike.
- 10. Issues With The Data? Lack of useable information. We need to collect data on our own. Ini peluang business untuk startup.
- 11. What Need To Be Collected?
- 12. Good Bad Like Dislike What we want to know from big data and any data that we analysis is this :- GOOD BAD LIKE DISLIKE Sentiment analysis
- 13. When Who Where What Why How When - @timestamp is important for data analysis. Who - Anonymous is important but we need to know male or female and his or her age. Where - Anonymous is important, but we still need the IP address to know from which country or state or county. What - The operating system, the browser's version Why - Keywords thats lead them How - How they know about us
- 14. How To Visualize Our Data I’m a fan of ELK Elasticsearch Logstash & Kibana ELK is one of Big Data tools
- 15. Index The Data With ES Used Elasticsearch to Index our data. One misconception. ES is not for storage. Don’t used ES to store our data. Data need to be archived elsewhere.
- 16. ES Search API The result in JSON. Developer love JSON. (May be) https://www.elastic.co/guide/en/elasticsearch/reference/5. 0/_exploring_your_data.html
- 17. Kibana We can use Kibana to view our data in ES.
- 18. DKAN We can store data with DKAN. DKAN follow CKAN. The open source open data platform with a full suite of cataloging, publishing and visualization features that allows organizations to easily share data with the public. http://www.nucivic.com/dkan/ Take advantage DKAN Datastore API
- 19. GeoSpatial Is Important Our data need to have spatial information (GPS Coordinate) We can used GeoServer to have our own Map Server. http://geoserver.org/
- 20. The End Q & A [email protected] 019-6085482 http://linuxmalaysia.harisfazillah.info/
No comments:
Post a Comment