Basic Data Exploration in R

ss

When you’re cleaning up data, you usually end up using a 5-8 functions a ton of times, and then a few more once or twice. Here are those 5-8 functions I find myself using again and again.

Here is a quick overview:

names() – returns the column names of a dateset

str() – gives the overview of a dataset

data.table package – includes functions for creating new columns, among other things

%in% operator – checks if a value is in a vector

Below are some examples.

>  names(rock) # returns the column names
[1] "area" "peri" "shape" "perm"

> str(rock)                         # gives the format of the dataframe
'data.frame': 48 obs. of 4 variables:
$ area : int 4990 7002 7558 7352 7943 7979 9333 8209 8393 6425 ...
$ peri : num 2792 3893 3931 3869 3949 ...
$ shape: num 0.0903 0.1486 0.1833 0.1171 0.1224 ...
$ perm : num 6.3 6.3 6.3 6.3 17.1 17.1 17.1 17.1 119 119 ...

# import the data.table package
> install.packages("data.table")             # don't forget these 3 steps!
> library(data.table)

> dtRock <- data.table(rock)

> dtRock[1:5]                    # returns the first 5 columns
area peri shape perm
1: 4990 2791.90 0.0903296 6.3
2: 7002 3892.60 0.1486220 6.3
3: 7558 3930.66 0.1833120 6.3
4: 7352 3869.32 0.1170630 6.3
5: 7943 3948.54 0.1224170 17.1

# and my favorite way to create a new column
> dtRock[, areaMP := area / 1000]    # area is measured in pixels, so areaMP                                          # is in mega pixels
> dtRock[1, ]                        # indicates the first row, all columns
area peri shape perm areaMP
1: 4990 2791.9 0.0903296 6.3 4.99

> dtRock[, 'areaMP']                 # returns the entire 'areaMP' column

# The %in% operator is one of the most useful functions in R, I think.
> a 4 %in% a                  # it's asking, is the value 4 in the vector a?
[1] TRUE

There are many other functions and packages, such as the ‘dplyr’ package by the amazing Hadley Wickham, but I am just showing the ones I use most frequently.

Advertisements

3 comments

  1. tTank for sharing, your post goes really straight to the point.

    Those functions are actually among the pearls of R.

    Now that I am spending more time on Python, I am really missing functions like the great str()!

    Thanks again,

    Andrea

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s