December 2015 Items of Interest

December Meetups:

Tidewater Analytics: Tuesday, 08 December

7:00 pm at 757 Creative Space, 259 Granby St. Suite 250, downtown Norfolk.

The feature presentation will by by Cathy Green, an independent Data Architect and Business Intelligence consultant. She will give an overview of the analytic features native to Excel 2013

Following Cathy’s presentation, there will be a brief discussion related to the 2016 Kaggle Machine Learning group’s activities.

757 R User’s Group: Tuesday, 15 December

6:30 pm at 757 Creative Space, 259 Granby, Suite 250, downtown Norfolk.

Keith Brown, a risk analyst for USAA, will discuss Hadley Wickham’s package ggplot2¬† for graphics.

Office Hours: No meeting this month.

Tidewater Big Data Enthusiasts: No meeting this month.

MOOCs and other educational venues:

TED Artificial Intelligence Playlist

A very interesting collection of six TED talks focused on artificial intelligence. All under 20 minutes long, and all accessible to lay people.

Books:

Recommended R Readings

This is a fantastic list of R books, ranging from “R for Dummies” on up to the most complex and sophisticated use cases. Well worth keeping on hand.

Miscellaneous:

The Hardest Parts of Data Science

This blog post is really spot on. Fitting models has become almost automated in some situations, but even when not, modern software really makes it pretty easy.

What’s really hard — and important — is defining the problem to begin with, and then measuring the solution.

Yanir Seroussi provides some excellent observations and thoughts on the matter in this blog post.

The Identity of Statistics in Data Science

So what is Data Science?

I avoid the term “Data Science” as much as I can. I think it’s vague and a bit pretentious. But a fellow named Tommy Jones, writing in AMSTATNEWS (the membership magazine of the American Statistical Association), came up with a definition that I think is pretty good. It’s in an article called, “The Identity of Statistics in Data Science,” and the context is figuring out where statistics and data science intersect. (This has been a huge topic of discussion and debate in the statistics community, the computer science community, and the nebulous data science community.

He basically considers data science “Supply Chain Management” for data products. It begins with a real-world problem and ends with a report. And in between there might be data wrangling, model development and validation, data base considerations, visualizations, and so on. It’s a good article and worth reading if you are thinking about including the term “data science” in your working vocabulary, this is worth a read.

Analytics Blog of the Month:

I cannot say enough good things about David Robinson‘s blog, Variance Explained. This guy takes difficult but important issues, and explains them in ways that — although they make take some work — really clarify the issues. His posting is a little sporadic, but the blog is well-worth following, because when he does post, it’s almost always a gem. A few of my favorites include:

K-means Clustering is not a free lunch

Understanding the beta distribution

Cleaning and visualizing genomic data

If you’re going to follow one data analytics blog, this one would be my recommendation.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s