KUAN-HUNG LIN (Gary) M.S. Data Analytics in GMU: 8月 2017

DEAN 690_Week 11

Assignment: Week 11

Share

Expert Interview – Data Visualization & Infographics

https://dadepaperblog.wordpress.com/2017/04/07/expert-interview-data-visualization-infographics/

Dataset

Topic: Virginia Crash Data (Department of Motor Vehicles)

Description: Data that is collected, stored and analyzed by this Division is used for problem identification and resolution by local, state and federal entities across the Commonwealth.

Link: https://www.dmv.virginia.gov/safety/#crash_data/index.asp

This website provides the simple tables by crash type: crashes by crash type, speed-related crashes, alcohol-related crashes, and Virginia motor vehicle statistics.

Questions

31. What is Collaborative filtering?

Collaborative filtering is one kind of recommendation system. For example, when we search keywords on Google. “Related artilces” is one kind of collaborative filtering.

Reference:

[1] Ekstrand, Michael D, John T Riedl, and Joseph A Konstan. Collaborative Filtering Recommender Systems. 1st ed. Hanover, Mass.: Now Publishers, 2011. Print.

DEAN 690_Week 10

Assignment: Week 10

Share

Book: Text Mining with R

Link: http://tidytextmining.com/

Dataset

This dataset exists factors: Race/Ethnicity, Age, Education, Income, and Gender. This website also uses SAS and provides several tables that include Adults with Diabetes Currently Taking Insulin, How Often Check Blood Sugar, How Often Check Feet, and different aspect of data.

Topic: Virginia Diabetes and Prediabetes dataset from 2011 to 2014

Disease description: Diabetes is a chronic disease in which sugar levels in the bloodstream are above normal. It occurs when a person cannot produce (type 1) or properly use (type 2) insulin. Insulin is a hormone that moves sugar out of the bloodstream into the cells to be used for energy. Prediabetes is a condition in which blood sugar is high, but not high enough to be type 2 diabetes.

Link: http://www.vdh.virginia.gov/diabetes/data/

Questions

23. What is latent semantic indexing? What is it used for? What are the specific limitations of the method?

Latent semantic indexing is one technique in natural language and mathematic method to analyze the relationship between terms and concepts.

Link: https://www.searchenginejournal.com/what-is-latent-semantic-indexing-seo-defined/21642/

News

This news release theWordStat for Stat for Mac version. It also introduce the software and functions.

Topic: WordStat for Stata Now Available for Use on Mac Computers

Link: http://www.kdnuggets.com/2017/04/wordstat-stata-mac-apple.html

Project Idea

Purpose:

Data visualization (Create more graphs)

Semantic Analysis

Learning:

Continue learning tutorial: Machine Learning in Natural Language Processing using R

Class 4: https://ufal.mff.cuni.cz/~hladka/2013/docs/day-4.posted.pdf

DEAN 690_Week 9

Assignment: Week 9

Share

An explanation of data visualization by storytelling.

EXCLUSIVE - Storytelling through data visualization – From museum displays to autonomous system interfaces

http://www.opengovasia.com/articles/7460-exclusive---storytelling-through-data-visualisation-from-museum-displays-to-autonomous-system-interfaces

Details as the following statement:

Challenge:

1^st data size

2^nd interface of open datasets websites

3^rd telling stories

Computing resources:

Expensive facilities: Computed Tomography scanner.

Ideas of public oriented use:

A science center which is similar as research center are providing astronomical data for storytelling.

Interactive communication of science evolving:

1^st Micro-level

Take human cell as example, the author mentions exploring data and the molecular structure inside a cell.

2^nd Time

Using statics data at time resolution and then intrigue audiences to visualize data. The author said “If you are going to visualise things, like blood flow, dynamically over time, then you have to replace say 20 GB of data on your scan for each time-step in the animation. “

Dataset

This dataset contains Radioactivity in Fish from 2011-2016 in Canada. It must can create different graphs to compare different years’ values.

Topic: Health Canada Analyses of the Radioactive Content of Fish Samples from Canada’s West Coast

http://open.canada.ca/data/en/dataset/d1b39de7-e525-4cfa-a605-af38bf174555

Questions

41. What is NLP? How is it related to Machine Learning?

Natural Language processing is using machine to interpret human language. Human produce a system to understand what human want to say. Then, machine learning is compute questions’ pattern and build a system to solve problems automatically. Therefore, Natural language is in machine learning field.

Reference:

[1] M. Jordan and T. Mitchell, "Machine learning: Trends, perspectives, and prospects", Science, vol. 349, no. 6245, pp. 255-260, 2015.

News

This news proposes an issue about data science to support policy making. It mentions Google and Facebook working on predictive analytics and supporting decision making. This news also takes Amazon and Apple as example to use machine learning to give services for customers. Thus, these ideas and tech supports could provide public policy new method on order to make better decision.

Topic: Collecting extensive data on you can improve policy making

https://www.thestar.com/opinion/commentary/2017/03/27/collecting-extensive-data-on-you-can-improve-policy-making.html

A news mentioned about Capital Bikeshare dataset and Capital Bikeshare GPS Study.

Link: http://technical.ly/dc/2016/06/22/check-cool-capital-bikeshare-visualizations/

Project Idea

From the link, data visualization includes 7 steps: Acquire, Parse, Filter, Mine, Represent, Refine, and Interact.

Acquire means obtaining the dataset.

Parse is to structure data and then separate categories.

Filter means cleaning data.

Mine is related to statistics or data mining to find patterns and understand meaning of dataset. Represent is creating simple graphs, such as line graphs, bar charts, and so on.

Refine focus on remodel the visualization graphs and also includes simple graphs.

Interact means give brain storming on new graphs in the future.

Topic: The Data Visualization Process

Link: https://www.dashingd3js.com/the-data-visualization-process

KUAN-HUNG LIN (Gary) M.S. Data Analytics in GMU

2017年8月20日星期日

DEAN 690_Week 11

DEAN 690_Week 10

DEAN 690_Week 9

Python program to display calendar

檢舉濫用情形

標籤

2017年8月20日 星期日

DEAN 690_Week 11

DEAN 690_Week 10

DEAN 690_Week 9

Python program to display calendar

2017年8月20日星期日