In this post i am going to show descriptive statistic example with R programming , i would like to presenting data in mean, standard deviation, maximum value, minimum value, and the last is range.
First thing i have to do is reading data. In this case, i would like to read The Sacramento crime January 2006 file contains 7,584 crime records in CSV format from this link and initialize it as a crime
variable :
crime <- read.csv('http://samplecsvs.s3.amazonaws.com/SacramentocrimeJanuary2006.csv',header=TRUE)
To see all variables name or column names used in crime data, i can use names
function
> names(crime)
[1] "cdatetime" "address" "district" "beat"
[5] "grid" "crimedescr" "ucr_ncic_code" "latitude"
[9] "longitude"
To show data per column just with simple data['names']
or ‘data$columnname, when i would like to implement this crime variable for getting grid data in the variable, it will be like this data['grid']
or this data$grid
Now, i’m going to go to the topic, there are some functions to show descriptive statistics in R, they are :
mean(varname)
for Meansd(varname)
for Standard deviationmax(varname)
for Maximum valuemin(varname)
for Minumum valuerange(varname)
for RangeQuite simple rigth? and now i am going to implement in on crime data,
> #showing mean
> mean(crime$district)
[1] 3.574631
> #showing standard deviation
> sd(crime$district)
[1] 1.642512
> #showing max value
> max(crime$district)
[1] 6
> #showing min value
> min(crime$district)
[1] 1
> #showing range data
> range(crime$district)
[1] 1 6
> #or we can do like this way
> #by attaching the variable data first
> attach(crime)
> mean(district)
[1] 3.574631
> sd(district)
[1] 1.642512
> max(district)
[1] 6
> min(district)
[1] 1
> range(district)
[1] 1 6
But how to show all descriptive statistic in all column? should we type it one by one? Actually there is a summary()
function for summarizing descriptive statistic for all column.
> summary(crime)
cdatetime address district
1/1/06 0:01 : 24 3555 AUBURN BLVD : 47 Min. :1.000
1/1/06 0:00 : 20 1689 ARDEN WAY : 31 1st Qu.:2.000
1/1/06 8:00 : 17 5770 FREEPORT BLVD : 29 Median :3.000
1/5/06 20:00: 14 2750 SUTTERVILLE RD: 26 Mean :3.575
1/18/06 8:00: 12 2250 68TH AVE : 23 3rd Qu.:5.000
1/23/06 0:00: 11 1695 ARDEN WAY : 21 Max. :6.000
(Other) :7486 (Other) :7407
beat grid crimedescr
2B : 521 Min. : 102.0 10851(A)VC TAKE VEH W/O OWNER: 653
3C : 491 1st Qu.: 567.0 TOWED/STORED VEH-14602.6 : 463
2C : 485 Median : 899.0 459 PC BURGLARY VEHICLE : 462
6A : 464 Mean : 916.3 TOWED/STORED VEHICLE : 434
6C : 457 3rd Qu.:1264.0 459 PC BURGLARY RESIDENCE : 356
2A : 450 Max. :1661.0 MISSING PERSON : 268
(Other) :4716 (Other) :4948
ucr_ncic_code latitude longitude
Min. : 909 Min. :38.44 Min. :-121.6
1st Qu.:2309 1st Qu.:38.52 1st Qu.:-121.5
Median :3532 Median :38.56 Median :-121.5
Mean :4275 Mean :38.56 Mean :-121.5
3rd Qu.:7000 3rd Qu.:38.61 3rd Qu.:-121.4
Max. :8102 Max. :38.68 Max. :-121.4
Thats at all, some way to calculate descriptive statistic with R programming, depend on me it’s very useful and amazing, hopefully it can helps beginner people in R.