2017年8月20日 星期日

DEAN 690_Week 8

Assignment: Week 8
Share
Visualizing Time-Series Change ( Python )
The author use python to create time-series line charts to show actual value of units, change in absolute units, percent chang, and so on.
An article “Visualizing Change: An Innovation in Time-Series Analysis” is created by SAS Institute. [1]
Reference:
[1] 2017. [Online]. Available: https://www.perceptualedge.com/articles/visual_business_intelligence/visualizing_change.pdf. [Accessed: 20- Mar- 2017].
Dataset
This dataset provides the following statement “Temporary Foreign Worker Program and International Mobility Program work permit holders and work permit holders for humanitarian and compassionate purposes by year in which the permit came into effect.”
------------------------------------------------------------------------------------------------
Dataset: Temporary Foreign Workers in Canada

Questions
25. What makes a dataset gold standard?
From StackExchange discussion, making gold standard is answering the most accurate test, benchmark, or diagnostic for the dataset.
Link:
StackExchange
Wikipedia
28. What is an API ? What are APIs used for?
API is application programming interfaces. It includes toolkits, frameworks, libraries, and software development kits. APIs provide machine to read the code and easier the process for programmers to use certain knowledge to create software. [1]
Reference:
[1] Myers, Brad A., and Jeffrey Stylos. "Improving API Usability". Communications of the ACM 59.6 (2016): 62-69. Web.
44. What are the benefits of Regularization?
In machine learning field, the benefits of regularization are solving five main technical tasks: measuring learning performance, overfitting, regularization, cross-validation, feature selection.
Reference:
[1] A. Prieditis and S. Russell, Machine learning, 1st ed. Burlington: Morgan Kaufmann/Elsevier Science, 2014.
News
This news discussed visualization graphs of Starbucks. These graphs show why Starbucks is shrinking while McDonald’s is growing.
Topic: The Backlash Against Starbucks Is Real, And It Isn't Going Away

Project Idea
From Kaggle, this dataset is related to tech jobs from H-1B Visa Petitions 2011-2016. My project topic is about IT job in Virginia. The person wrote the code to explore the tech jobs in Philadelphia, Pennsylvania. This week will explore and compare the different of each area.

DEAN 690_Week 7

DEAN 690

Assignment: Week 7
Dataset
Topic: 2017 #Oscars Tweets
This dataset contains 29,000+ tweets about the 2017 Academy Awards.
Link: https://www.kaggle.com/madhurinani/oscars-2017-tweets
GMU library provides this book for FREE online version.

Natural language processing for social media

Atefeh. Farzindar author. Diana. Inkpen author. 2015


Questions
31. Explain survivorship bias?
Survivorship bias is a specific type of selection bias. This logical errors happen when making decisions, human ignore logical thinking and follow past thinking pattern. Finally, human make same past failures.
Reference Link:
Wikipedia
https://en.wikipedia.org/wiki/Survivorship_bias
Rational wiki
http://rationalwiki.org/wiki/Survivorship_bias
24. Give an explanation of collaborative filtering.
Collaborative filtering is one kind of recommendation. When searching information on the internet will show other related links. For example, when searching about specific papers on Google search, there are many links will show main information. The following line will show related article. This is one kind of collaborative filtering.
Reference
[1] J. Bobadilla, A. Hernando, F. Ortega and A. Gutiérrez, "Collaborative filtering based on significances", Information Sciences, vol. 185, no. 1, pp. 1-17, 2012.


News
KD nuggets posted an article to explain anomaly detection.
From wikipedia, the following link shows the explanation of this concept.
https://en.wikipedia.org/wiki/Anomaly_detection
Topic: Introduction to Anomaly Detection

Project Idea
Natural Language Processing for Social Media
This book mentions Introduction to Social Media Analysis / Linguistic Pre-processing\\ of Social Media Texts / Semantic analysis of social media texts, applications of social media text analysis, data collection, annotation, and evaluation. If someone work on this topic, it might be useful to brainstorm the datasets.

Python program to display calendar

# Python program to display calendar of given month of the year # importing calendar module for calendar operations import calendar # set t...