Mohit Ved

Mohit Ved

Mohit Ved

C-DAC’s Big Data Software Suite (C-BDSS) An extensible open platform for Big Data Science

The world is drenched with data today. Data is flowing in at an unprecedented scale from all classes of applications. While decisions were earlier derived from guesses or laboriously developed models, we can now base our decisions on data alone. Such Big Data analysis now affects organizations across practically all industries, including financial services, manufacturing, mobile services, life sciences, and physical sciences. Since the data in question is big, the traditional data analysis tools would prove to be of limited use, thereby emphasizing the need of special computational platforms for efficient and meaningful Big Data processing. Finding a data analytics platform which offers the availability of Big Data tools in one place would ease the analysis of Big Data and at the same time allow data scientists to focus on actual Data Science. In this paper, we describe an extensible Big Data Science platform – CDAC’s Big Data Software Suite, that tenders a ready-to-use Big Data Analytics environment for performing effective Data Science.

C-BDSS fuses the power of Hadoop’s infinitely scalable data storage file system with tools to move data in and out of Hadoop and to perform data transformation, both for batch and in-memory processing. This makes it a suitable candidate to perform statistical analysis on historical data as well as real-time analysis on streaming data. At C-DAC, C-BDSS has been used to demonstrate the solution to some real world problems such as prediction of web visitors, network intrusion detection and recommendation systems. Additionally, it has been used to develop automation process to automate the data flow between the packaged tools.

Moreover, C-BDSS also provides analysts and data scientists, the flexibility to blend other big data technologies to cater to the requirements of Data Science problems. Given such a readymade platform application development can be carried out at a faster pace.