Assignment:
Week 5
Dataset
In kaggle, a dataset is H-1B Visa Petitions 2011-2016
which include case status, employers' name, job title, full-time position,
year, worksite, and so on.
Ideas:
1.
Search the job title which related to data analytics.
2.
To explore big companies
3.
To compare different years
Question
9.
What is selection bias?
Selection bias is from individuals,
groups or data of the selection. The selection will be analyzed by failing
proper processes of making something random. Making certain about the group of
population is being analyzing.
13.
What is logistic regression?
Statistics:
Logistic regression which is also called logit regression or logit model is one
specific type of regression model.
42. What is a local optimum?
Local optimum is an optimization problem among
particular neighborhood of values.
23. What is root cause analysis?
Root cause analysis is one kind of problem solving
methods to establish or indicate the root
causes of faults or problems.
Refer to "P. Wilson, L. Dell and G. Anderson, Root cause
analysis, 1st ed. Milwaukee, Wis.: ASQC Quality Press, 1993."
29. During analysis, how do you treat
missing values?
First
step: Understand the data and then find the patterns and reasons of
missing values.
Second step: Check
out any chances to estimate or give up the missing values.
Final step: Make
decision to fill or delete the missing values.
Link: http://handbook.cochrane.org/chapter_16/16_1_2_general_principles_for_dealing_with_missing_data.htm
News
5 Career Paths in Big
Data and Data Science, Explained
This news provides four
concepts explanation for Big Data and Data Science.
Project Idea
Processing dataset
Explore specific word “IT”
No relation between specific word “IT”
Dig more concepts of natural language processing
Use new package from python ( Python NLTK Cookbook, Link: http://streamhacker.com/
)
New Dataset
http://data-vgin.opendata.arcgis.com/datasets?q=datathon2016&sort_by=relevance
沒有留言:
張貼留言