2017年8月20日 星期日

DEAN 690_Week 5

Assignment: Week 5
Dataset
In kaggle, a dataset is H-1B Visa Petitions 2011-2016 which include case status, employers' name, job title, full-time position, year, worksite, and so on.
Ideas:
1. Search the job title which related to data analytics.
2. To explore big companies
3. To compare different years
Question
9. What is selection bias?
          Selection bias is from individuals, groups or data of the selection. The selection will be analyzed by failing proper processes of making something random. Making certain about the group of population is being analyzing.
13. What is logistic regression?
Statistics: Logistic regression which is also called logit regression or logit model is one specific type of regression model.
42. What is a local optimum?
Local optimum is an optimization problem among particular neighborhood of values.
23. What is root cause analysis?
Root cause analysis is one kind of problem solving methods to establish or indicate the root causes of faults or problems.
Refer to "P. Wilson, L. Dell and G. Anderson, Root cause analysis, 1st ed. Milwaukee, Wis.: ASQC Quality Press, 1993."
29. During analysis, how do you treat missing values?
First step: Understand the data and then find the patterns and reasons of missing values.
Second step: Check out any chances to estimate or give up the missing values.
Final step: Make decision to fill or delete the missing values.
News
5 Career Paths in Big Data and Data Science, Explained
This news provides four concepts explanation for Big Data and Data Science.
Project Idea
Processing dataset
Explore specific word “IT”
No relation between specific word “IT”
Dig more concepts of natural language processing
Use new package from python ( Python NLTK Cookbook, Link: http://streamhacker.com/ )

New Dataset

http://data-vgin.opendata.arcgis.com/datasets?q=datathon2016&sort_by=relevance

沒有留言:

張貼留言

Python program to display calendar

# Python program to display calendar of given month of the year # importing calendar module for calendar operations import calendar # set t...