July 2015 Items of Interest

Meetups:

757 Python User’s Group: Tuesday, 07 July.

The 757 Python User’s Group will meet at the Hampton Public Library, 4207 Victoria Blvd., Hampton.

The topic this month is Errors and Exceptions in Python 3. Rick Jones will discuss the following related topics: types of errors and traceback messages, exceptions vs assertions, general exception structure (try, except, else, finally), raising exceptions and user-defined exceptions, exception anti-patterns, and logging exceptions.

The Python code and the presentation slides will be distributed at the Meetup, and posted on the Tidewater Analytics GitHub site afterwards.

Tidewater Analytics: Tuesday, 14 July:

Tidewater Analytics will meet at 757 Creative Space, 259 Granby St. Suite 250, downtown Norfolk.

Chris Ovide, a local entrepreneur, will present his current project that involves using binary logistic regression to determine a person’s preferences in facial attraction. The main presentation will be Polyglot Persistence. Dr. Chuck Cartledge,of Old Dominion University, will discuss the variety of relational and NoSQL data stores that are currently in use, and compare and contrast them from the CRUD (create-read-update-delete) perspective.

757 R User’s Group: Tuesday, 21 July:

The 757 R User’s Group will meet at 757 Creative Space, 259 Granby St. Suite 250, downtown Norfolk.

Nipun Rahman will give a presentation on the SQLDF package, which allows manipulation of data frames using SQL commands.

The main meeting will be preceded by a 30 – 45 minute beginner’s tutorial session — starting at 5:45 pm — for those who are new to R.

Other Meetup News:

Tidewater Analytics Big Data Enthusiasts: Dr. Chuck Cartledge, who will be giving the Polyglot Persistence presentation at the July Tidewater Analytics meeting, has created a new local Meetup that will focus on Big Data. From the Meetup site:

We are a group of people in the Tidewater area who are interested in exploring, sharing, and understanding Big Data. We have a mixture of people with interests and expertise in various aspects of big data, what it is, how it works, how it affects each of us, and assist people establishing personal networks of friends and colleagues with similar interests.

A collection of things that I hope the group will explore, includes:

1.  What is “Big Data,” how does it affect me, and how am I supplying “Big Data??”  Probably include a simple demonstration of using Pig to process Medicare data, and then displaying the results via R.

2.  Where can I get my hands on some Big Data??  Medicare payments, pharmaceutical payments, census data, ZCATs, etc.   Some sort of demo on getting data from these places, how the data has to be munged, what kinds of problems exist with all data sets.

3.  What kinds of tools are available for processing Big Data??  The world doesn’t end at Hadoop or Casandra.  There are other tools/applications that might be a better fit.

4.  How do I visualize all this data??  Getting Big data is fun.  Analysing can be a challenge.  When it is all over, how can the data be made real with some sort of visualization techniques.

5.  What are the challenges with real-time Big Data??  Firstly, what does real-time mean??  Secondly, what kinds of tools are available to handle masses of real-time data.

6.  How does the “Internet of Things” affect what we call Big Data??  As more and more things (cars, phones, refrigerators, wearable devices) are wired, and more and more data is being collected, how does that affect what we do with Big Data??

As we talk and share ideas, other topics will come up and we will follow them to see where they go.

Come ready to share ideas, experiences, and interest in all things Big Data.

Link: Tidewater Big Data Enthusiasts

MOOCs and other online educational venues:

Coursera:

Genomic Data Science Certificate. Dr. Jeff Leek, of Johns Hopkins University and one of the principals behind the Data Science Specialization track, is offering a 7-part (plus Capstone) certificate program in Genomic Data Science. In view of the growing interest and prevalence of BioTech — along with Dr. Leek’s proven track record with Coursera — this is probably a pretty decent investment.

Data Science Certificate. Speaking of Jeff Leek, the ongoing 9-part (plus Capstone) Data Science Specialization begins again on 6 July. A number of Tidewater Analytics members have taken various courses in this, and the reviews have been uniformly good.

Duke University:

Data, Statistical Inference, and Modeling. Dr. Mine Çetinkaya-Rundel, Assistant Professor of the Practice in the Department of Statistical Science at Duke University, is offering an online, non-credit certificate course that “explores methods of acquiring and validating data, analyzing and modeling data, and interpreting results correctly without relying on statistical jargon.  A step-by-step technique shows how to use R statistical programming language for practical applications.  Examples and projects focus on real-world phenomena and have common usage in a wide variety of professions.  An instructor provides webinars (a.k.a., virtual office hours) to guide your learning through complex problems and projects.

Online Events and Topics:

Intro to SparkR. 1:00 pm, Wednesday, 15 July

SparkR is an R package that combines the power of R and Spark. It provides a lightweight front-end to Apache Spark by exposing the Spark API allowing users to run interactive Spark jobs from the R shell. The free webinar will feature Shivaram Venkataraman, Co-author of SparkR. Here is the registration link.

Neural Networks for Newbies

A good video for anyone interested in neural networks and just getting started at it.

Books and such:

Introduction to Statistics (with Python). This looks like a really good free book. If I were not so committed to R right now in my statistics work, I’d be all over this.

Machine Learning in Python: Essential Techniques for Predictive Analysis. A relatively new book with good reviews.

Probabilistic Programming & Bayesian Methods for Hackers. This seems to be a free, downloadable book with lots of good information on programming Bayesian methods. It is self-described as “An intro to Bayesian methods and probabilistic programming from a computation/understanding-first, mathematics-second point of view.

Miscellaneous:

Master R Developer Workshop: Although not until September, this is one that you cannot start planning for too early. Hadley Wickham will be conducting this workshop in Washington DC on 14 – 15 September, 2015. The price is pretty hefty at $1,500 plus some sort of $55 registration fee. But if you’re serious about being an R developer, it probably doesn’t get any better than this.

Open Source Data Science Master’s: This is a quirky little website that’s worth taking a look at if you are reading this blog.

R 3.2.1 Released: Many new features and bug fixes. Check it out: R 3.2.1

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s