Kokoan tänne teemoittain tärkeitä resursseja

1 Avun hakeminen!!

Avoimen lähdekoodin onjelmointikielen opiskelu ja käyttö on yhtä paljon googlaamisen opiskelua kun se on itse ohjelmointia. Hyvä ja aikaansaava ohjelmoija on ennen kaikkea hyvä muodostamaan täsmällisiä hakulausekkeita googleen.

1.1 “Virallinen” dokumentaatio

Virallinen dokumentaatio on täsmällisin, mutta alussa melko vaikeaselkoista. Siihen on hyvä aina palata kun tiedot lisääntyvät.

Yksi hyvä tapa tutustua R-projektin rakenteeseen on käydä läpi temaattiset “Task View”:t https://cran.r-project.org/web/views/ kuten CRAN Task View: Statistics for the Social Sciences

Selaamalla R-journaalia pysyt ajan tasalla uusista vakavasti otettavista paketeista, jotka ovat päässeet CRAN:iin, mutta joita kuvaavat paperit ovat myös läpäisseet vertaisarvioinnin.

1.2 Stack Overflow

SO on vilkas ohjelmointiin keskittyvä kysymys-vastaus -sivusto, jossa käydään paljon R:n liittyvää keskustelua. Katso:

2 R:n perusteet

2.1 Lue/tutustu

3 Datan tuominen

3.1 Lue

3.2 Katso

3.3 Paketteja datan tuomiseen

These packages help you import data into R and save data.

  • feather - a fast, lightweight file format used by both R and Python
  • readr - reads tabular data
  • readxl - reads Microsoft Excel spreadsheets
  • openxlsx - reads Microsoft Excel spreadsheets
  • googlesheets - reads Google spreadsheets
  • haven - reads SAS, SPSS, and Stata files
  • httr - reads data from web APIs
  • rvest - scrapes data from web pages
  • xml2 - reads HTML and XML data
  • webreadr - reads common web log formats
  • DBI - a universal interface to database management systems (DBMS)
  • PivotalR - reads data from and interfaces with Postgres, Greenplum, and HAWQ
  • dplyr - contains an interface to common databases
  • data.table - fread() for fast table reading
  • git2r - tools to access git repositories

3.4 Datapaketteja

These packages contain data sets to use as training data or toy examples.

  • babynames - Names given to US babies 1880-2014
  • neiss - sample of all accidents reported to US emergency rooms 2009-2014
  • yrbss - Youth Risk Behaviour Surveillance System data from 1991 to 2013
  • nycflights13 - all out-bound flights from NYC in 2013
  • hflights - flights departing Houston in 2011
  • USAboundaries - Historical and Contemporary Boundaries of the United States of America
  • rworldmap - country border data
  • usdanutrients - USDA nutrient database
  • fueleconomy - EPA fuel economy data
  • nasaweather - geographic and atmospheric measures on a very coarse 24 by 24 grid covering Central America
  • mexico-mortality - deaths in Mexico
  • data-movies and ggplotmovies - data from the Internet Movie Database (IMDB)
  • pop-flows - Population flows around the USA in 2008
  • data-housing-crisis - Clean data related to the 2008 US housing crisis
  • gun-sales - Statistical analysis of monthly background checks of gun purchases from NY times
  • stationaRy - hourly meteorological data from one of thousands of global stations
  • gapminder - Excerpt from the Gapminder data
  • janeaustenr - Jane Austen’s Complete Novels

4 Data siivoaminen

4.2 Paketteja

These packages help you wrangle your data into a form that is easy to analyze in R.

  • tidyr - tools for tidying layout of tabular data
  • dplyr - tools for joining multiple tables into a tidy data set
  • purrr - tools for applying R functions to data structures, very useful when tidying
  • broom - tools for tidying statistical models into data frames
  • zoo - data structures for time series data
  • PivotalR - R wrappers for in-database SQL operations (i.e. join, group by)

5 Datan muokkaaminen

5.1 Paketteja

These packages help you transform your data into new types of data.

  • dplyr - a grammar of data transformation
  • magrittr - a concise syntax for calling sequences of functions
  • tibble - efficient display structure for tabular data
  • stringr - tools for working with strings and regular expressions
  • lubridate - tools for working with dates and times
  • xts - tools for time series based data
  • data.table - fast data manipulation
  • vtreat - tools for pre-processing variables for predictive modeling
  • stringi - fast string processing facilities.
  • Matrix - LAPACK methods for dense and sparse matrix operations

6 Datan visualisoiminen

6.1 Paketteja

These packages help you visualize your data.

  • ggplot2 with extensions - a versatile system for making plots
    • ggthemes - plot style themes
    • ggmap - maps with Google Maps, Open Street Maps, etc.
    • ggiraph - interactive ggplots
    • ggstance - horizontal versions of common plots
    • GGally - scatterplot matrices
    • ggalt - additional coordinate systems, geoms, etc.
    • ggforce - additional geoms, etc.
    • ggrepel - prevent plot labels from overlapping
    • ggraph - graphs, networks, trees and more
    • ggpmisc - photo-biology related extensions
    • geomnet - network visualization
    • ggExtra - marginal histograms for a plot
    • gganimate - animations
    • plotROC - interactive ROC plots
    • ggspectra - tools for plotting light spectra
    • ggnetwork - geoms to plot networks
    • ggtech - style themes for plots
    • ggradar - radar charts
    • ggTimeSeries - time series visualizations
    • ggtree - tree visualizations
    • ggseas - seasonal adjustment tools
  • lattice - Trellis graphics
  • rgl - interactive 3D plots
  • ggvis - versatile system for interactive graphs
  • htmlwidgets - framework for creating JavaScript widgets with R
  • rCharts - many interactive JavaScript visualizations
  • coefplot - visualizes model statistics
  • quantmod - candlestick financial charts
  • colorspace - HSL based color palettes
  • viridis - Matplotlib viridis color pallete for R
  • munsell - Munsell color palettes for R.
  • RColorBrewer - color palettes for plots. No manual or website.
  • dichromat - color-blind friendly palettes. No manual or website.
  • igraph - Network Analysis and Visualization
  • latticeExtra - Extensions for lattice graphics
  • sp - tools for spatial data

7 Datan mallintaminen

7.1 Lue

7.3 Paketteja

These packages help you build models and make inferences. Often the same packages will focus on both topics.

  • car - functions from An R Companion to Applied Regression
  • Hmisc - miscellaneous functions for data analysis
  • multcomp - Simultaneous Inference in General Parametric Models
  • pbkrtest - parametric bootstrap test for linear mixed effects models
  • mvtnorm - Multivariate Normal and t Distributions
  • MatrixModels - Modelling with Sparse And Dense Matrices
  • SparseM - linear algebra for sparse matrices
  • lme4 - Linear Mixed-Effects Models using Eigen C++ library
  • broom - tools for tidying statistical models into data frames
  • caret - tools for Classification And REgression Training
  • glmnet - generalized linear models via penalized maximum likelihood
  • mosaic - Tools for teaching mathematics, statistics, computation and modeling
  • gbm - gradient boosted regression models
  • xgboost - Extreme Gradient Boosting
  • randomForest - Random Forests for Classification and Regression
  • ranger - a fast implementation of Random Forests
  • h2o - parallel distributed machine learning algorithms
  • ROCR - plots to visualize classifier performance
  • pROC - Tools for visualizing, smoothing and comparing ROC curves
  • PivotalR - R wrappers for MADlib’s parallel distributed machine learning algorithms

8 Raportointi

8.1 Paketteja

These packages help you communicate the results of data science to your audiences.

  • rmarkdown - easy-to-use format for reproducible reports and dynamic documents in R
  • knitr - embed R code within pdf and html reports
  • flexdashboard - easy-to-create dashboards based on rmarkdown
  • bookdown - books and long documents built on R Markdown
  • rticles - ready to use R Markdown templates
  • tufte - Tufte handout R Markdown template
  • DT - Interactive data tables
  • pixiedust - Customized tables
  • xtable - Customized tables
  • highr - Syntax Highlighting for R Source Code
  • formatR - tidy_source() to format R source code
  • yaml - Methods to convert R data to YAML and back
  • pander - renders R objects into Pandoc markdown.

2017-2018 Markus Kainu.

Creative Commons -lisenssi
Tämä teos on lisensoitu Creative Commons Nimeä 4.0 Kansainvälinen -lisenssillä.