2017年8月20日 星期日

DEAN 690_Week 6

Assignment: Week 6
Question
41. What is power analysis?
Power analysis is from experimental design. This analysis determine the sample size and be required to detect an effect of a given size with a given degree of confidence. On the other hand, it determines the effect size and be limited in sample size.

Data fusion?
About data fusion, this article explains that the data fusion as the process of integration of multiple data and knowledge representing the same real-world object into a consistent, accurate, and useful representation. Does anyone could give suggestions for this criterion KUAN-HUNG_LIN_Assignment- Week 4 “data fusion”?

Reference
[1] M. Haghighat, M. Abdel-Mottaleb and W. Alhalabi, "Discriminant Correlation Analysis: Real-Time Feature Level Fusion for Multimodal Biometric Recognition", IEEE Transactions on Information Forensics and Security, vol. 11, no. 9, pp. 1984-1996, 2016.
News
"What makes a good data visualization"
If people have to do data visualization. This article could be a reference to revise the data visualization in order to provide better graphs for presentation.
http://www.kdnuggets.com/2017/03/what-makes-good-data-visualization.html
Project Idea
Python 3 Text Processing with NLTK 3 Cookbook
http://streamhacker.com/

This book could read online from GMU library. ( FREE )

DEAN 690_Week 5

Assignment: Week 5
Dataset
In kaggle, a dataset is H-1B Visa Petitions 2011-2016 which include case status, employers' name, job title, full-time position, year, worksite, and so on.
Ideas:
1. Search the job title which related to data analytics.
2. To explore big companies
3. To compare different years
Question
9. What is selection bias?
          Selection bias is from individuals, groups or data of the selection. The selection will be analyzed by failing proper processes of making something random. Making certain about the group of population is being analyzing.
13. What is logistic regression?
Statistics: Logistic regression which is also called logit regression or logit model is one specific type of regression model.
42. What is a local optimum?
Local optimum is an optimization problem among particular neighborhood of values.
23. What is root cause analysis?
Root cause analysis is one kind of problem solving methods to establish or indicate the root causes of faults or problems.
Refer to "P. Wilson, L. Dell and G. Anderson, Root cause analysis, 1st ed. Milwaukee, Wis.: ASQC Quality Press, 1993."
29. During analysis, how do you treat missing values?
First step: Understand the data and then find the patterns and reasons of missing values.
Second step: Check out any chances to estimate or give up the missing values.
Final step: Make decision to fill or delete the missing values.
News
5 Career Paths in Big Data and Data Science, Explained
This news provides four concepts explanation for Big Data and Data Science.
Project Idea
Processing dataset
Explore specific word “IT”
No relation between specific word “IT”
Dig more concepts of natural language processing
Use new package from python ( Python NLTK Cookbook, Link: http://streamhacker.com/ )

New Dataset

http://data-vgin.opendata.arcgis.com/datasets?q=datathon2016&sort_by=relevance

DEAN 690_Week 4

Dataset
This website provides 476 million twitter tweets. First idea is to set up categories to separate the tweets and then try to find any relation among these tweets. I am still thinking ideas of this dataset. If anyone has ideas, let me know, please.
Question
13. What is logistic regression?
Statistics: Logistic regression which is also called logit regression or logit model is one specific type of regression model.
14. Compare R and python
Personal opinion: two of these software focus on different aspects.
R: statistics and graphic model
Python: high-level programming language
42. What is a local optimum?
Local optimum is an optimization problem among particular neighborhood of values.
News
Topic: Moving from R to Python: The Libraries You Need to Know
Useful package: R -> Python
Project Idea
•Explore dataset.
•Link diverse IT jobs to catalogue.
•Create relationship between IT jobs.
NLP learning Blog
http://ling-blogs.bu.edu/static/lx390f16/page5/


Natural Language Processing with Python


http://www.nltk.org/book/
PYTHON NLTK COOKBOOK

http://streamhacker.com/

Python program to display calendar

# Python program to display calendar of given month of the year # importing calendar module for calendar operations import calendar # set t...