Reviewing, from a statistical perspective, the cyber-infrastructure ecosystem including distributed computing, multi node and distributed file eco systems, such as Amazon Web Services. Structured and unstructured data sources, including social media data and image data. Setting up of large data structures for analysis. Algorithms and techniques for computing statistics and statistical models on distributed data. Software to be used include, Hadoop, Map reduce, SAS, SAS Data loader for Hadoop.