Reviewing, from a statistical perspective, the cyberinfrastructure ecosystem including distributed computing, multi node and distributed file eco systems. Structured and unstructured data sources, including social media data and image data. Setting up of large data structures for analysis. Algorithms and techniques for computing statistics and statistical models on distributed data. Software to be used include, Hadoop, Map reduce, SAS, SAS Data loader for Hadoop.