top of page

LATEST PROJECTS

Project 1 | Analysing factors responsible for points scored in a soccer game                                                    
Predicting team performance using predictive modelling techniques like multiple regression, neural networks and decision tree.The data for this project was collected from the site https://datahub.io/dataset/uk-premier-league-match-by-match-2011-2012 . This premier League data set was considered because first two team positions ​had the same number of points obtained in the season but differed by only goal differences. It was a great challenge to predict the winning team due to this. the scope was narrowed down to measure individual team performances. Hence data consolidation was necessary to have data related to individual teams rather than individuals.Data was of mixed type(continuous and categorical ), dummy variables were created as needed and data was reduced because of multi co linearity issues. Later Cluster with independent variables having lowest (1-R^2)value was considered for data modeling. neural network , multiple regression and decision trees were considered fore modeling and compared the models with the selection criteria- average squared error.Neural network had the lowest mean squared error and hence was considered the best model.
Project 2 | Creation and analysis of Cubes and Data Mining for Health Care data                                          
​data consisted of heath related data from past 7 years gathered from Center for Health Systems Innovation at Oklahoma State University with a focus on innovating both clinical models and business models.project concentrates on building models to predict the interesting patterns and significant measures that would help in taking managerial decisions to improve profitability.Using this analysis, team aims at providing sophisticated details concerned with CHSI patients that would give a glimpse of the health across the locality.Microsoft Visual Studio and SQL Server Management Studio are used to create the dimensions, cube and perform MDX(Multi dimensional queries ) queries and create models for data mining.Each of the MDX query is developed using a unique MDX function and aims in showcasing the data filtered from the database.Function used were TOP COUNT,NON EMPTY,DIMENSION,CROSS JOIN,ITEM,FILTER,HIERARCHIZE,IIF,SUM, MAX, MIN and AXIS.Two models were created with one emphasizing on the factors
affecting cost of medicine and another model on the factors affecting infusion time of the medicine.

Report

Project 3| Data visualization of Olympic games using tableau  
Data set: http://www.tableausoftware.com/public/sites/default/files/OlympicAthletes_0.xlsx had about 8500+ rows and years ranging from 2000-2012. The whole purpose of the project was to create a story and publish in tableau public. Here is the link to the tableau public https://public.tableau.com/profile/publish/Olympics_Through_Years/OlympicsMedalAnalysis#!/publish-confirm Audience that is being focused is sponsors and advertisers. Since their main purpose is to promote their product it was necessary for sponsors and advertisers to focus on which sport or country or athlete they want to promote their product.
more projects on next page...
bottom of page