Descriptive Statistic With R

In this post i am going to show descriptive statistic example with R programming , i would like to presenting data in mean, standard deviation, maximum value, minimum value, and the last is range.

First thing i have to do is reading data. In this case, i would like to read The Sacramento crime January 2006 file contains 7,584 crime records in CSV format from this link and initialize it as a crime variable :

crime <- read.csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv',header=TRUE)

To see all variables name or column names used in crime data, i can use names function

> names(crime)
[1] "cdatetime"     "address"       "district"      "beat"
[5] "grid"          "crimedescr"    "ucr_ncic_code" "latitude"
[9] "longitude"

To show data per column just with simple data['names'] or ‘data$columnname, when i would like to implement this crime variable for getting grid data in the variable, it will be like this data['grid'] or this data$grid

Now, i’m going to go to the topic, there are some functions to show descriptive statistics in R, they are :

  • mean(varname) for Mean
  • sd(varname) for Standard deviation
  • max(varname) for Maximum value
  • min(varname) for Minumum value
  • range(varname) for Range

Quite simple rigth? and now i am going to implement in on crime data,

> #showing mean
> mean(crime$district)
[1] 3.574631
> #showing standard deviation
> sd(crime$district)
[1] 1.642512
> #showing max value
> max(crime$district)
[1] 6
> #showing min value
> min(crime$district)
[1] 1
> #showing range data
> range(crime$district)
[1] 1 6
> #or we can do like this way
> #by attaching the variable data first
> attach(crime)
> mean(district)
[1] 3.574631
> sd(district)
[1] 1.642512
> max(district)
[1] 6
> min(district)
[1] 1
> range(district)
[1] 1 6

But how to show all descriptive statistic in all column? should we type it one by one? Actually there is a summary() function for summarizing descriptive statistic for all column.

> summary(crime)
        cdatetime                   address        district
 1/1/06 0:01 :  24   3555 AUBURN BLVD   :  47   Min.   :1.000
 1/1/06 0:00 :  20   1689 ARDEN WAY     :  31   1st Qu.:2.000
 1/1/06 8:00 :  17   5770 FREEPORT BLVD :  29   Median :3.000
 1/5/06 20:00:  14   2750 SUTTERVILLE RD:  26   Mean   :3.575
 1/18/06 8:00:  12   2250 68TH AVE      :  23   3rd Qu.:5.000
 1/23/06 0:00:  11   1695 ARDEN WAY     :  21   Max.   :6.000
 (Other)     :7486   (Other)            :7407
         beat           grid                                crimedescr
 2B        : 521   Min.   : 102.0   10851(A)VC TAKE VEH W/O OWNER: 653
 3C        : 491   1st Qu.: 567.0   TOWED/STORED VEH-14602.6     : 463
 2C        : 485   Median : 899.0   459 PC  BURGLARY VEHICLE     : 462
 6A        : 464   Mean   : 916.3   TOWED/STORED VEHICLE         : 434
 6C        : 457   3rd Qu.:1264.0   459 PC  BURGLARY RESIDENCE   : 356
 2A        : 450   Max.   :1661.0   MISSING PERSON               : 268
 (Other)   :4716                    (Other)                      :4948
 ucr_ncic_code     latitude       longitude
 Min.   : 909   Min.   :38.44   Min.   :-121.6
 1st Qu.:2309   1st Qu.:38.52   1st Qu.:-121.5
 Median :3532   Median :38.56   Median :-121.5
 Mean   :4275   Mean   :38.56   Mean   :-121.5
 3rd Qu.:7000   3rd Qu.:38.61   3rd Qu.:-121.4
 Max.   :8102   Max.   :38.68   Max.   :-121.4

Thats at all, some way to calculate descriptive statistic with R programming, depend on me it’s very useful and amazing, hopefully it can helps beginner people in R.

rss facebook twitter github youtube mail spotify instagram linkedin google pinterest medium vimeo