I need help with a Python question. All explanations and answers will be used to help me learn.

Data Analysis Project


Generate a summary report from a data source listed below.


Total 100 Points:

  • Jupyter Notebook 50 Points – Notebook showing your work and generated tables and graphs.
  • Summary report 50 points. – Document presenting a summary of your data analysis process.


Using a public data source from below create a summary report with an observation of the data or answer a specific question about the data. Submit a word doc / pdf with a summary of your observations along with a Jupyter notebook output (PDF/print to PDF) showing your work.

Data Sets and Questions

  1. Instacart Market Basket Analysis (What will I buy next? 3 Million Instacart Orders, Open Sourced) What is the top and bottom selling items.
  2. Amazon Reviews for Sentiment Analysis (Let’s get sentimental. Few million rows of Amazon customer review text and star ratings) What is the best, and worst rated products?
  3. Indian Premier League | Kaggle (Love Cricket? This is the dataset for you. This dataset has IPL data from all seasons and all matches. Can you predict the winner for the next season?)
  4. Walmart Recruiting – Store Sales Forecasting ( Hmm… you think you can forecast? Data from 45 stores in the US, also bakes in the seasonality and key events so be prepared for ups and downs. ) Compute some average temps and other weather for a city?
  5. Trending YouTube Video Statistics and Comments (How about good old Exploratory Data Analysis ( EDA) and insights generation? Can you identify the attributes that make a video popular? 200 trending videos from US and UK)
  6. Credit Card Fraud Detection (Let’s play fraudster. Data from a European credit card. Can you cope up with the low incidence rate of 0.17% ?)
  7. Climate Change: Earth Surface Temperature Data (Is Global Warming for real? Global temperature data from the year 1750 onwards. What will be a good way to statistically segment this data?)
  8. https://www.kaggle.com/hhs/healt… (Healthcare analytics is booming. Data from US Department of Health on individual and small businesses. What drives the plan rate? Who makes the most money?)
  9. Used cars database | Kaggle (370,000 used cars data scraped from Germany Ebay. Let’s keep it simple- build a linear regression model)
  10. Human Resources Analytics | Kaggle (Why do employees leave? Note that this is simulated data and not very large in size)

Place New Order
It's Free, Fast & Safe

"Looking for a Similar Assignment? Order now and Get a Discount!