2017年8月20日 星期日

DEAN 690_Week 11

Assignment: Week 11

Share
Expert Interview – Data Visualization & Infographics

Dataset
Topic: Virginia Crash Data (Department of Motor Vehicles)
Description: Data that is collected, stored and analyzed by this Division is used for problem identification and resolution by local, state and federal entities across the Commonwealth.
This website provides the simple tables by crash type: crashes by crash type, speed-related crashes, alcohol-related crashes, and Virginia motor vehicle statistics.
Questions
31. What is Collaborative filtering?
Collaborative filtering is one kind of recommendation system. For example, when we search keywords on Google. “Related artilces” is one kind of collaborative filtering.
Reference:

[1] Ekstrand, Michael D, John T Riedl, and Joseph A Konstan. Collaborative Filtering Recommender Systems. 1st ed. Hanover, Mass.: Now Publishers, 2011. Print.

DEAN 690_Week 10

Assignment: Week 10

Share
Book: Text Mining with R
Dataset
This dataset exists factors: Race/Ethnicity, Age, Education, Income, and Gender. This website also uses SAS and provides several tables that include Adults with Diabetes Currently Taking Insulin, How Often Check Blood Sugar, How Often Check Feet, and different aspect of data.
Topic: Virginia Diabetes and Prediabetes dataset from 2011 to 2014
Disease description: Diabetes is a chronic disease in which sugar levels in the bloodstream are above normal. It occurs when a person cannot produce (type 1) or properly use (type 2) insulin. Insulin is a hormone that moves sugar out of the bloodstream into the cells to be used for energy. Prediabetes is a condition in which blood sugar is high, but not high enough to be type 2 diabetes.
Questions
23. What is latent semantic indexing? What is it used for? What are the specific limitations of the method?
Latent semantic indexing is one technique in natural language and mathematic method to analyze the relationship between terms and concepts.
News
This news release theWordStat for Stat for Mac version. It also introduce the software and functions.
Topic: WordStat for Stata Now Available for Use on Mac Computers


Project Idea
Purpose:
Data visualization (Create more graphs)
Semantic Analysis
Learning:
Continue learning tutorial: Machine Learning in Natural Language Processing using R

Class 4: https://ufal.mff.cuni.cz/~hladka/2013/docs/day-4.posted.pdf

DEAN 690_Week 9

Assignment: Week 9

Share
An explanation of data visualization by storytelling.
EXCLUSIVE - Storytelling through data visualization – From museum displays to autonomous system interfaces
Details as the following statement:
Challenge:
1st data size
2nd interface of open datasets websites
3rd telling stories
Computing resources:
Expensive facilities: Computed Tomography scanner.
Ideas of public oriented use:
A science center which is similar as research center are providing astronomical data for storytelling.
Interactive communication of science evolving:
1st Micro-level
            Take human cell as example, the author mentions exploring data and the molecular structure inside a cell.
2nd Time
            Using statics data at time resolution and then intrigue audiences to visualize data. The author said “If you are going to visualise things, like blood flow, dynamically over time, then you have to replace say 20 GB of data on your scan for each time-step in the animation. “




Dataset
This dataset contains Radioactivity in Fish from 2011-2016 in Canada. It must can create different graphs to compare different years’ values.
Topic: Health Canada Analyses of the Radioactive Content of Fish Samples from Canada’s West Coast

Questions
41. What is NLP? How is it related to Machine Learning?
Natural Language processing is using machine to interpret human language. Human produce a system to understand what human want to say. Then, machine learning is compute questions’ pattern and build a system to solve problems automatically. Therefore, Natural language is in machine learning field.
Reference:
[1] M. Jordan and T. Mitchell, "Machine learning: Trends, perspectives, and prospects", Science, vol. 349, no. 6245, pp. 255-260, 2015.
News
This news proposes an issue about data science to support policy making. It mentions Google and Facebook working on predictive analytics and supporting decision making. This news also takes Amazon and Apple as example to use machine learning to give services for customers. Thus, these ideas and tech supports could provide public policy new method on order to make better decision.
Topic: Collecting extensive data on you can improve policy making
A news mentioned about Capital Bikeshare dataset and Capital Bikeshare GPS Study.



Project Idea

From the link, data visualization includes 7 steps: Acquire, Parse, Filter, Mine, Represent, Refine, and Interact.
Acquire means obtaining the dataset.
Parse is to structure data and then separate categories.
Filter means cleaning data.
Mine is related to statistics or data mining to find patterns and understand meaning of dataset. Represent is creating simple graphs, such as line graphs, bar charts, and so on.
Refine focus on remodel the visualization graphs and also includes simple graphs.
Interact means give brain storming on new graphs in the future.
Topic: The Data Visualization Process

Link: https://www.dashingd3js.com/the-data-visualization-process

Python program to display calendar

# Python program to display calendar of given month of the year # importing calendar module for calendar operations import calendar # set t...