Twitter Data Analysis
Goal: Collect data of who follows whom from Twitter app and performed grouping the users by the number of users they follow. For each group, calculated the number of users belonging to that group. Basics: Map reduce : A MapReduce program is composed of a map procedure, which performs filtering and sorting, and a reduce method, which performs a summary operation. It is a programming model and a n associated implementation for processing and generating big data sets with a parallel, distributed algorithm on a cluster. A MapReduce framework (or system) is usually composed of three operations (or steps): Map: each worker node applies the map function to the local data, and writes the output to a temporary storage. A master node ensures that only one copy of the redundant input data is processed. Shuffle: worker nodes redistribute data based on the output keys (produced by the map function), such that all data belonging to...