Kertausta

Pakettien asentaminen ja lataaminen

Asentamalla uusia paketteja saat R:ään käyttöön uusia funktioita. Monissa paketeissa mukana tulee myös pieniä datoja funktioiden käytön harjoitteluun. Useimmat paketeista mahdollistavat jonkun laskennallisen operaation tekemisen, yhä useammat tarjoavat toiminnallisuuksia R:stä esim. tietokantojen (eurostat) tai verkkoteknologioiden (leaflet) rajapintoihin.

# Asenna keskitetystä pakettihallinnasta CRAN
install.packages("eurostat")
# Ota paketti käyttöön
library(eurostat)
# Asenna kehitysversiot github:sta devtools-paketin avulla
devtools::install_github("ropengov/eurostat")

Ladattujen pakettien uusia funktioita voi käyttää joko A) lataamalla paketin ja kutsumalla funktiota tai B) kutsumalla pakettia ja funktiota yhtä aikaa. Tämän kurssin materiaaleissa pyrin käyttämään aina vaihtoehtoa B, jotta opiskelijoille olisi selkeämpää milloin käytössä oleva funktio on ns. “ulkoisesta” paketista ja milloin taas ns. base-R:stä

# tapa A
library(eurostat)
d <- get_eurostat(id = "tgs00026")
## Reading cache file /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
## Table  tgs00026  read from cache file:  /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
# tapa B
d <- eurostat::get_eurostat(id = "tgs00026")
## Reading cache file /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
## Table  tgs00026  read from cache file:  /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds

Paketit tulee asentaa uudelleen aina uuden R-version myötä (x.y (3.6) version kohdalla, ei x.y.z (3.5.3) kohdalla)

R:n perusteet

To understand computations in R, two slogans are helpful:

  • Everything that exists is an object.
  • Everything that happens is a function call."
John Chambers
Creator of the S programming language, and core member of the R programming language project.

Eli R:n toiminnan ymmärtämisessä on tärkeää muistaa kaksi asiaa:

  1. Kaikki mitä on olemassa ovat objekteja
  2. Kaikki mikä tapahtuu, tapahtuu kutsumalla funktioita

Objektien luominen

Everything that exists is an object

objekti <- objektin_arvo
objekti = objektin_arvo

100 * (0.05 + 0.05)
# [1] 10
sqrt(10+6)
# [1] 4

# numeerinen vektori jonka pituus on 1 ja arvo 100
sata <- 100
sata
# [1] 100
is(sata)
# [1] "numeric" "vector"

# character vektori jonka pituus on 1 ja arvo "sata"
sata_tekstina <- "sata"
sata_tekstina
# [1] "sata"
is(sata_tekstina)
# [1] "character" "vector" "data.frameRowLabels" "SuperClassMethod"
# Luodaan kaksi 7 elementin pituista vektoria, joista toinen on numeerinen ja toinen character
nimi <- c("Juhani","Tuomas","Aapo","Simeoni","Timo","Lauri","Eero")
ika  <- c(25,23,23,22,20,20,17)
typeof(nimi)
# "character"
typeof(ika)
# "double"
length(ika)
# loogiset vektorit
looginen_vektori <- c(TRUE,FALSE,T,T,F,FALSE)

Funktioiden kutsuminen/käyttö

Everything that happens is a function call

vektori <- c(1,2,3,4,5,6,7,8)
ka <- mean(vektori)
ka
# [1] 4.5
is(mean)
# [1] "function" "OptionalFunction" "PossibleMethod"
sum(vektori)
# [1] 36

Koodin kirjoituskonventiot

Koodia kirjoitetaan ensisijaisesti tietokoneelle, mutta sen pitää olla ymmärrettävää myös ihmisille. R-ympäristössä on erilaisia tapoja kirjoittaa komentoja, ja tällä kurssilla pyrin käyttämään paljon ns. ketjutusta (piping ie. pipeline).

R:ään tämän ketjutuksen tarjoaa magrittr-paketti. Pipe-operator tai putkioperaattori %>% voidaan ajatella tarkoittavan then/sitten. Pikanäppäin on Ctrl + Shift + m.

Alla on erilaisia tapoja kirjoittaa/jäsentää koodia/analyysiprosessia. Esimerkissä käsitellään dplyr::starwars

library(dplyr)
d <- dplyr::starwars
 
# 1) Yksi operaatio per rivi
d_subset <- select(d, name, height, mass, hair_color, skin_color, eye_color, homeworld, species)
d_non_humans <- filter(d_subset, species != "Human")
d_non_humans_punajasinisilmat <- filter(d_subset, eye_color %in% c("red","blue"))
d_non_humans_punajasinisilmat_by_height <- arrange(d_non_humans_punajasinisilmat, desc(height))
tulos1 <- slice(d_non_humans_punajasinisilmat_by_height, 1:5)

# + analyysin välivaiheet on saatavilla
# - paljon nimeämistä, joka työlästä

# 2) operaatiot sisäkkäin yhdellä rivillä

tulos2 <- slice(arrange(filter(filter(select(d, name, height, mass, hair_color, skin_color, eye_color, homeworld, species),species != "Human"), eye_color %in% c("red","blue")), desc(height)), 1:5)
 
# - edellyttää lukemista oikealta vasemmalle
# + ei tarvitse nimetä

# 3) Operaatiot ketjutettuina
tulos3 <- d %>%  
  select(name, height, mass, hair_color, skin_color, eye_color, homeworld, species) %>% 
  filter(species != "Human") %>% 
  filter(eye_color %in% c("red","blue")) %>% 
  arrange(desc(height)) %>% 
  slice(1:5)

# + ei tarvitse nimetä
# + vasemmalta oikealle
# - ei pääsyä välivaiheisiin

Funktiot tiedostojärjestelmän käyttöön

Tiedostojärjestelmätoiminnot [fs](http://fs.r-lib.org/)-paketilla

library(magrittr)
# luo kansio "aineisto" nykyiseen työhakenmistoon
dir.create(path = "./aineisto")
fs::dir_create(path = "./aineisto")
# luo uusi tekstitiedosto "teksti.txt" nykyiseen työhakemistoon ja avaa se muokattavaksi
file.create("./teksti.txt")
file.edit("./teksti.txt")
fs::file_create("./teksti.txt") %>% file.edit()
# listaa työhakemistossa ja sen alahakemistoissa olevat tiedostot, joilla pääte ".R"
list.files(path = "./", all.files = TRUE, full.names = TRUE, recursive = TRUE, pattern = ".R$")
fs::dir_ls(path = "./",  all = TRUE, type = "file",  recursive = TRUE, glob = "*.R")
# listaa kaikki työhakemiston alahakemistot ja niiden alahakemistot
list.dirs(path = "./", recursive = TRUE, full.names = TRUE)
fs::dir_ls(path = "./",  all = TRUE, type = "directory",  recursive = TRUE)
# kopioi tiedosto teksti.txt kansioon aineisto
file.copy(from = "./session1_perusteet.R", to = "./aineisto")
fs::file_copy(path = "./session1_perusteet.R", new_path = "./aineisto/session1_perusteet.R")
# poista kaikki tiedostot kansiosta aineisto, joilla pääte ".R"
file.remove(list.files(path = "./aineisto", pattern = ".R$"))
fs::file_delete(path = fs::dir_ls(path = "./",  all = TRUE, type = "file",  recursive = TRUE, glob = "*.R"))
# fs-paketin näppärät lisämuuttujat
fs::dir_info(path = "./", all = TRUE, recursive = TRUE, type = "file") %>% 
  arrange(desc(size))
# tiedostojen lataaminen verkosta levylle
download.file(url = "http://siteresources.worldbank.org/INTRES/Resources/469232-1107449512766/allginis_2013.xls", 
              # windowsissa muista
              mode = "wb",
              destfile = "./aineisto/allginis_2013.xls")
# pakattujen zip-tiedostojen lataaminen ja purkaminen
download.file(url = "http://fenixservices.fao.org/faostat/static/bulkdownloads/Food_Security_Data_E_All_Data_(Normalized).zip",
              destfile = "./Food_Security_Data_E_All_Data_(Normalized).zip")
unzip("./Food_Security_Data_E_All_Data_(Normalized).zip", exdir = "./aineisto")
file.remove("./Food_Security_Data_E_All_Data_(Normalized).zip")
d <- read.csv("./aineisto/Food_Security_Data_E_All_Data_(Normalized).csv", stringsAsFactors = FALSE)

Datojen lukeminen levyltä ja kirjoittaminen levylle (tiedostot!)

fs::dir_create("./aineisto_tmp/")
download.file("http://www.qogdata.pol.gu.se/data/qog_oecd_cs_jan19.csv", "./aineisto_tmp/qog_oecd_cs_jan19.csv", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.sav", "./aineisto_tmp/it04ip.sav", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.sas7bdat", "./aineisto_tmp/it04ip.sas7bdat", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.dta", "./aineisto_tmp/it04ip.dta", mode = "wb")

csv   <- readr::read_csv("./aineisto_tmp/qog_oecd_cs_jan19.csv")
spss  <- haven::read_sav("./aineisto_tmp/it04ip.sav")
sas   <- haven::read_sas("./aineisto_tmp/it04ip.sas7bdat")
stata <- haven::read_dta("./aineisto_tmp/it04ip.dta")

haven::write_dta(data = stata, path = "./aineisto_tmp/stata.dta")
haven::write_sas(data = stata, path = "./aineisto_tmp/stata.sas7bdat")
haven::write_sav(data = stata, path = "./aineisto_tmp/stata.sav")

fs::dir_delete(path = "./aineisto_tmp") 

Datojen lukeminen ja kirjoittaminen tietokantaan

library(dplyr)
library(nycflights13)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, nycflights13::flights, "flights",
  temporary = FALSE, 
  indexes = list(
    c("year", "month", "day"), 
    "carrier", 
    "tailnum",
    "dest"
  )
)

# Haetaan flights taulu ns. näkymäksi (reference) tietokannassa! EI SIIS TUODA R:än muistiin!
flights_db <- tbl(con, "flights")
# Sama SQL-kielellä
# flights_db <- odbc::dbGetQuery(con, "SELECT * FROM flights")

# valitaan muuttujia näkymässä
flights_db %>% select(year:day, dep_delay, arr_delay)
## # Source:   lazy query [?? x 5]
## # Database: sqlite 3.22.0 []
##     year month   day dep_delay arr_delay
##    <int> <int> <int>     <dbl>     <dbl>
##  1  2013     1     1         2        11
##  2  2013     1     1         4        20
##  3  2013     1     1         2        33
##  4  2013     1     1        -1       -18
##  5  2013     1     1        -6       -25
##  6  2013     1     1        -4        12
##  7  2013     1     1        -5        19
##  8  2013     1     1        -3       -14
##  9  2013     1     1        -3        -8
## 10  2013     1     1        -2         8
## # … with more rows
# suodatetaan rivejä näkymässä
flights_db %>% filter(dep_delay > 240)
## # Source:   lazy query [?? x 19]
## # Database: sqlite 3.22.0 []
##     year month   day dep_time sched_dep_time dep_delay arr_time
##    <int> <int> <int>    <int>          <int>     <dbl>    <int>
##  1  2013     1     1      848           1835       853     1001
##  2  2013     1     1     1815           1325       290     2120
##  3  2013     1     1     1842           1422       260     1958
##  4  2013     1     1     2115           1700       255     2330
##  5  2013     1     1     2205           1720       285       46
##  6  2013     1     1     2343           1724       379      314
##  7  2013     1     2     1332            904       268     1616
##  8  2013     1     2     1412            838       334     1710
##  9  2013     1     2     1607           1030       337     2003
## 10  2013     1     2     2131           1512       379     2340
## # … with more rows, and 12 more variables: sched_arr_time <int>,
## #   arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## #   origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## #   minute <dbl>, time_hour <dbl>
# tehdään ryhmittäisiä yhteenvetoja näkymässä
flights_db %>% 
  group_by(dest) %>%
  summarise(delay = mean(dep_time))
## # Source:   lazy query [?? x 2]
## # Database: sqlite 3.22.0 []
##    dest  delay
##    <chr> <dbl>
##  1 ABQ   2006.
##  2 ACK   1033.
##  3 ALB   1627.
##  4 ANC   1635.
##  5 ATL   1293.
##  6 AUS   1521.
##  7 AVL   1175.
##  8 BDL   1490.
##  9 BGR   1690.
## 10 BHM   1944.
## # … with more rows
# Tehdään uusi näkymä, jossa ryhmittäisiä yhteenvetoja ja suodatuksia
tailnum_delay_db <- flights_db %>% 
  group_by(tailnum) %>%
  summarise(
    delay = mean(arr_delay),
    n = n()
  ) %>% 
  arrange(desc(delay)) %>%
  filter(n > 100)

# Näytetään yo. käsittelyn SQL-koodi
tailnum_delay_db %>% show_query()
## <SQL>
## SELECT *
## FROM (SELECT *
## FROM (SELECT `tailnum`, AVG(`arr_delay`) AS `delay`, COUNT() AS `n`
## FROM `flights`
## GROUP BY `tailnum`)
## ORDER BY `delay` DESC)
## WHERE (`n` > 100.0)
# Haetaan datan R:än muistiin tibble-objektiksi
tailnum_delay <- tailnum_delay_db %>% collect()

Avoimen datan hyödyntäminen

pxweb eli Tilastokeskus, verottaja yms.

install.packages(pxweb)
dat <- pxweb::pxweb_interactive()

CKAN-alustalla julkaistun datan käyttö (avoindata.fi, mm. Väestörekisterikeskus, tulevaisuudessa myös Kela)

# install.packages(ckanr)
library(ckanr)
ckanr_setup(url = "https://www.avoindata.fi/data/fi/")
x <- package_search(q = "Väestörekisterikeskus", fq = "title:nimi")
resources <- x$results[[1]]$resources

download.file(resources[[1]]$url, "file.xlsx", mode = "wb")
readxl::excel_sheets("file.xlsx")
## [1] "Miehet kaikki" "Miehet ens"    "Miehet muut"   "Naiset kaikki"
## [5] "Naiset ens"    "Naiset muut"   "Saate"
dat <- readxl::read_xlsx("file.xlsx", sheet = "Miehet kaikki") # data
head(dat)
## # A tibble: 6 x 2
##   Etunimi  Lukumäärä
##   <chr>        <dbl>
## 1 Juhani      288179
## 2 Olavi       144642
## 3 Antero      139907
## 4 Tapani      136887
## 5 Johannes    135708
## 6 Tapio       116507

Kansainväliset organisaatiot

Eurostat

library(eurostat)
res <- search_eurostat("Gross domestic")
dat <- eurostat::get_eurostat("nama_10r_2gdp")
## Reading cache file /tmp/RtmpIe5H92/eurostat/nama_10r_2gdp_date_code_TF.rds
## Table  nama_10r_2gdp  read from cache file:  /tmp/RtmpIe5H92/eurostat/nama_10r_2gdp_date_code_TF.rds
head(dat)
## # A tibble: 6 x 4
##   unit    geo   time       values
##   <fct>   <fct> <date>      <dbl>
## 1 EUR_HAB AT    2017-01-01  42100
## 2 EUR_HAB AT1   2017-01-01  41700
## 3 EUR_HAB AT11  2017-01-01  30000
## 4 EUR_HAB AT12  2017-01-01  34400
## 5 EUR_HAB AT13  2017-01-01  50000
## 6 EUR_HAB AT2   2017-01-01  37500

Maailmanpankki

library(WDI)
WDIsearch('gdp')[1:10,]
##       indicator                       
##  [1,] "SH.XPD.TOTL.ZS"                
##  [2,] "SH.XPD.PUBL.ZS"                
##  [3,] "SH.XPD.PRIV.ZS"                
##  [4,] "SH.XPD.KHEX.GD.ZS"             
##  [5,] "SH.XPD.GHED.GD.ZS"             
##  [6,] "SH.XPD.CHEX.GD.ZS"             
##  [7,] "UIS.XGDP.23.FSGOV"             
##  [8,] "UIS.XGDP.1.FSGOV.FDINSTADM.FFD"
##  [9,] "UIS.XGDP.1.FSGOV"              
## [10,] "UIS.XGDP.0.FSGOV.FDINSTADM.FFD"
##       name                                                                
##  [1,] "Health expenditure, total (% of GDP)"                              
##  [2,] "Health expenditure, public (% of GDP)"                             
##  [3,] "Health expenditure, private (% of GDP)"                            
##  [4,] "Capital health expenditure (% of GDP)"                             
##  [5,] "Domestic general government health expenditure (% of GDP)"         
##  [6,] "Current health expenditure (% of GDP)"                             
##  [7,] "Government expenditure on secondary education as % of GDP (%)"     
##  [8,] "Government expenditure in primary institutions as % of GDP (%)"    
##  [9,] "Government expenditure on primary education as % of GDP (%)"       
## [10,] "Government expenditure in pre-primary institutions as % of GDP (%)"
dat <- WDI(indicator='NY.GDP.PCAP.KD', country=c('MX','CA','US'), start=1960, end=2012)
head(dat)
##   iso2c country NY.GDP.PCAP.KD year
## 1    CA  Canada       48724.25 2012
## 2    CA  Canada       48456.96 2011
## 3    CA  Canada       47447.48 2010
## 4    CA  Canada       46543.79 2009
## 5    CA  Canada       48510.57 2008
## 6    CA  Canada       48552.70 2007

OECD

library(OECD)
search_dataset("unemployment", data = dataset_list)
dataset_list <- get_datasets()
search_dataset("gdp", data = dataset_list)
# Katso ohjeita: https://github.com/expersso/OECD

ILO

Datojen käsittelyn perusteet

Kurssin esimerkeissä tästä eteenpäin näytetään aina kaksi erilaista tapaa toteuttaa sama operaatio, 1) ns. base-R ratkaisu (ilman lisäpaketteja) ja 2) dplyr ratkaisu. Ratkaisut ovat aina peräkkäin ja dplyr-toteutus on aina merkitty eksplisiittisesti dplyr::funktio_x().

Kurssillä pyrimme käyttämään ainoastaan ns. data.frame luokkaan kuuluvia objekteja. Teknisesti ajateltuna R:ssä data.frame on vektoreista koostuva lista. Vektorit voivat olla numeerisia, tekstiä tai faktoreita, mutta niiden tulee olla saman pituisia (ks. 7-veljestä demo).

Käytetään dataa starwars paketista dplyr.

sw <- as.data.frame(dplyr::starwars) # tehdään aluksi normaaliksi data.frameksi
nrow(sw) # rivien määrä
## [1] 87
ncol(sw) # sarakkeiden/muuttujien määrä
## [1] 13
dim(sw)  # molemmat
## [1] 87 13
# Kuusi ensimmäistä riviä
head(sw) # tai
##             name height mass  hair_color  skin_color eye_color birth_year
## 1 Luke Skywalker    172   77       blond        fair      blue       19.0
## 2          C-3PO    167   75        <NA>        gold    yellow      112.0
## 3          R2-D2     96   32        <NA> white, blue       red       33.0
## 4    Darth Vader    202  136        none       white    yellow       41.9
## 5    Leia Organa    150   49       brown       light     brown       19.0
## 6      Owen Lars    178  120 brown, grey       light      blue       52.0
##   gender homeworld species
## 1   male  Tatooine   Human
## 2   <NA>  Tatooine   Droid
## 3   <NA>     Naboo   Droid
## 4   male  Tatooine   Human
## 5 female  Alderaan   Human
## 6   male  Tatooine   Human
##                                                                                                                                       films
## 1                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2                    Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4                                                              Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6                                                                                     Attack of the Clones, Revenge of the Sith, A New Hope
##                             vehicles                starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2                                                            
## 3                                                            
## 4                                             TIE Advanced x1
## 5              Imperial Speeder Bike                         
## 6
sw[1:6,] # tai
##             name height mass  hair_color  skin_color eye_color birth_year
## 1 Luke Skywalker    172   77       blond        fair      blue       19.0
## 2          C-3PO    167   75        <NA>        gold    yellow      112.0
## 3          R2-D2     96   32        <NA> white, blue       red       33.0
## 4    Darth Vader    202  136        none       white    yellow       41.9
## 5    Leia Organa    150   49       brown       light     brown       19.0
## 6      Owen Lars    178  120 brown, grey       light      blue       52.0
##   gender homeworld species
## 1   male  Tatooine   Human
## 2   <NA>  Tatooine   Droid
## 3   <NA>     Naboo   Droid
## 4   male  Tatooine   Human
## 5 female  Alderaan   Human
## 6   male  Tatooine   Human
##                                                                                                                                       films
## 1                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2                    Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4                                                              Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6                                                                                     Attack of the Clones, Revenge of the Sith, A New Hope
##                             vehicles                starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2                                                            
## 3                                                            
## 4                                             TIE Advanced x1
## 5              Imperial Speeder Bike                         
## 6
# dplyr-tapa
dplyr::slice(sw, 1:6)
##             name height mass  hair_color  skin_color eye_color birth_year
## 1 Luke Skywalker    172   77       blond        fair      blue       19.0
## 2          C-3PO    167   75        <NA>        gold    yellow      112.0
## 3          R2-D2     96   32        <NA> white, blue       red       33.0
## 4    Darth Vader    202  136        none       white    yellow       41.9
## 5    Leia Organa    150   49       brown       light     brown       19.0
## 6      Owen Lars    178  120 brown, grey       light      blue       52.0
##   gender homeworld species
## 1   male  Tatooine   Human
## 2   <NA>  Tatooine   Droid
## 3   <NA>     Naboo   Droid
## 4   male  Tatooine   Human
## 5 female  Alderaan   Human
## 6   male  Tatooine   Human
##                                                                                                                                       films
## 1                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2                    Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4                                                              Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6                                                                                     Attack of the Clones, Revenge of the Sith, A New Hope
##                             vehicles                starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2                                                            
## 3                                                            
## 4                                             TIE Advanced x1
## 5              Imperial Speeder Bike                         
## 6

Vektoreiden ja datojen rakenteet ja niihin viittaaminen

# vektorin indeksit
v1 <- 10:20
v1[1:4]
## [1] 10 11 12 13
# data.frame
sw[[1,2]]
## [1] 172

data.frame vs. tibble

R:llä on jo ikää ja tavanomaisen data.frame:n oheen on kehitetty vastaavia hieman modernimpia luokkia, kuten data.table ja tibble. data.table on saman nimisen paketin luokka vastaavalle rakenteelle, jolle tehdyt metodit ovat nopeita. Luokka eroaa data.frame luokasta sen verran, että perusmetodit eivät aina toimi. Mm. siitä syystä tällä kurssilla käytämme rinnakkain datoja luokissa tibble ja data.frame.

  • stringsAsFactors = FALSE oletuksena
  • printtaa nätisti
  • list-columns eli listasarakkeet
# tulostetaan data sellaisenaan
sw
##                     name height   mass    hair_color          skin_color
## 1         Luke Skywalker    172   77.0         blond                fair
## 2                  C-3PO    167   75.0          <NA>                gold
## 3                  R2-D2     96   32.0          <NA>         white, blue
## 4            Darth Vader    202  136.0          none               white
## 5            Leia Organa    150   49.0         brown               light
## 6              Owen Lars    178  120.0   brown, grey               light
## 7     Beru Whitesun lars    165   75.0         brown               light
## 8                  R5-D4     97   32.0          <NA>          white, red
## 9      Biggs Darklighter    183   84.0         black               light
## 10        Obi-Wan Kenobi    182   77.0 auburn, white                fair
## 11      Anakin Skywalker    188   84.0         blond                fair
## 12        Wilhuff Tarkin    180     NA  auburn, grey                fair
## 13             Chewbacca    228  112.0         brown             unknown
## 14              Han Solo    180   80.0         brown                fair
## 15                Greedo    173   74.0          <NA>               green
## 16 Jabba Desilijic Tiure    175 1358.0          <NA>    green-tan, brown
## 17        Wedge Antilles    170   77.0         brown                fair
## 18      Jek Tono Porkins    180  110.0         brown                fair
## 19                  Yoda     66   17.0         white               green
## 20             Palpatine    170   75.0          grey                pale
## 21             Boba Fett    183   78.2         black                fair
## 22                 IG-88    200  140.0          none               metal
## 23                 Bossk    190  113.0          none               green
## 24      Lando Calrissian    177   79.0         black                dark
## 25                 Lobot    175   79.0          none               light
## 26                Ackbar    180   83.0          none        brown mottle
## 27            Mon Mothma    150     NA        auburn                fair
## 28          Arvel Crynyd     NA     NA         brown                fair
## 29 Wicket Systri Warrick     88   20.0         brown               brown
## 30             Nien Nunb    160   68.0          none                grey
## 31          Qui-Gon Jinn    193   89.0         brown                fair
## 32           Nute Gunray    191   90.0          none       mottled green
## 33         Finis Valorum    170     NA         blond                fair
## 34         Jar Jar Binks    196   66.0          none              orange
## 35          Roos Tarpals    224   82.0          none                grey
## 36            Rugor Nass    206     NA          none               green
## 37              Ric Olié    183     NA         brown                fair
## 38                 Watto    137     NA         black          blue, grey
## 39               Sebulba    112   40.0          none           grey, red
## 40         Quarsh Panaka    183     NA         black                dark
## 41        Shmi Skywalker    163     NA         black                fair
## 42            Darth Maul    175   80.0          none                 red
## 43           Bib Fortuna    180     NA          none                pale
## 44           Ayla Secura    178   55.0          none                blue
## 45              Dud Bolt     94   45.0          none          blue, grey
## 46               Gasgano    122     NA          none         white, blue
## 47        Ben Quadinaros    163   65.0          none grey, green, yellow
## 48            Mace Windu    188   84.0          none                dark
## 49          Ki-Adi-Mundi    198   82.0         white                pale
## 50             Kit Fisto    196   87.0          none               green
## 51             Eeth Koth    171     NA         black               brown
## 52            Adi Gallia    184   50.0          none                dark
## 53           Saesee Tiin    188     NA          none                pale
## 54           Yarael Poof    264     NA          none               white
## 55              Plo Koon    188   80.0          none              orange
## 56            Mas Amedda    196     NA          none                blue
## 57          Gregar Typho    185   85.0         black                dark
## 58                 Cordé    157     NA         brown               light
## 59           Cliegg Lars    183     NA         brown                fair
## 60     Poggle the Lesser    183   80.0          none               green
## 61       Luminara Unduli    170   56.2         black              yellow
## 62         Barriss Offee    166   50.0         black              yellow
## 63                 Dormé    165     NA         brown               light
## 64                 Dooku    193   80.0         white                fair
## 65   Bail Prestor Organa    191     NA         black                 tan
## 66            Jango Fett    183   79.0         black                 tan
## 67            Zam Wesell    168   55.0        blonde fair, green, yellow
## 68       Dexter Jettster    198  102.0          none               brown
## 69               Lama Su    229   88.0          none                grey
## 70               Taun We    213     NA          none                grey
## 71            Jocasta Nu    167     NA         white                fair
## 72         Ratts Tyerell     79   15.0          none          grey, blue
## 73                R4-P17     96     NA          none         silver, red
## 74            Wat Tambor    193   48.0          none         green, grey
## 75              San Hill    191     NA          none                grey
## 76              Shaak Ti    178   57.0          none    red, blue, white
##    eye_color birth_year        gender      homeworld        species
## 1       blue       19.0          male       Tatooine          Human
## 2     yellow      112.0          <NA>       Tatooine          Droid
## 3        red       33.0          <NA>          Naboo          Droid
## 4     yellow       41.9          male       Tatooine          Human
## 5      brown       19.0        female       Alderaan          Human
## 6       blue       52.0          male       Tatooine          Human
## 7       blue       47.0        female       Tatooine          Human
## 8        red         NA          <NA>       Tatooine          Droid
## 9      brown       24.0          male       Tatooine          Human
## 10 blue-gray       57.0          male        Stewjon          Human
## 11      blue       41.9          male       Tatooine          Human
## 12      blue       64.0          male         Eriadu          Human
## 13      blue      200.0          male       Kashyyyk        Wookiee
## 14     brown       29.0          male       Corellia          Human
## 15     black       44.0          male          Rodia         Rodian
## 16    orange      600.0 hermaphrodite      Nal Hutta           Hutt
## 17     hazel       21.0          male       Corellia          Human
## 18      blue         NA          male     Bestine IV          Human
## 19     brown      896.0          male           <NA> Yoda's species
## 20    yellow       82.0          male          Naboo          Human
## 21     brown       31.5          male         Kamino          Human
## 22       red       15.0          none           <NA>          Droid
## 23       red       53.0          male      Trandosha     Trandoshan
## 24     brown       31.0          male        Socorro          Human
## 25      blue       37.0          male         Bespin          Human
## 26    orange       41.0          male       Mon Cala   Mon Calamari
## 27      blue       48.0        female      Chandrila          Human
## 28     brown         NA          male           <NA>          Human
## 29     brown        8.0          male          Endor           Ewok
## 30     black         NA          male        Sullust      Sullustan
## 31      blue       92.0          male           <NA>          Human
## 32       red         NA          male Cato Neimoidia      Neimodian
## 33      blue       91.0          male      Coruscant          Human
## 34    orange       52.0          male          Naboo         Gungan
## 35    orange         NA          male          Naboo         Gungan
## 36    orange         NA          male          Naboo         Gungan
## 37      blue         NA          male          Naboo           <NA>
## 38    yellow         NA          male       Toydaria      Toydarian
## 39    orange         NA          male      Malastare            Dug
## 40     brown       62.0          male          Naboo           <NA>
## 41     brown       72.0        female       Tatooine          Human
## 42    yellow       54.0          male       Dathomir         Zabrak
## 43      pink         NA          male         Ryloth        Twi'lek
## 44     hazel       48.0        female         Ryloth        Twi'lek
## 45    yellow         NA          male        Vulpter     Vulptereen
## 46     black         NA          male        Troiken          Xexto
## 47    orange         NA          male           Tund          Toong
## 48     brown       72.0          male     Haruun Kal          Human
## 49    yellow       92.0          male          Cerea         Cerean
## 50     black         NA          male    Glee Anselm       Nautolan
## 51     brown         NA          male       Iridonia         Zabrak
## 52      blue         NA        female      Coruscant     Tholothian
## 53    orange         NA          male        Iktotch       Iktotchi
## 54    yellow         NA          male        Quermia       Quermian
## 55     black       22.0          male          Dorin        Kel Dor
## 56      blue         NA          male       Champala       Chagrian
## 57     brown         NA          male          Naboo          Human
## 58     brown         NA        female          Naboo          Human
## 59      blue       82.0          male       Tatooine          Human
## 60    yellow         NA          male       Geonosis      Geonosian
## 61      blue       58.0        female         Mirial       Mirialan
## 62      blue       40.0        female         Mirial       Mirialan
## 63     brown         NA        female          Naboo          Human
## 64     brown      102.0          male        Serenno          Human
## 65     brown       67.0          male       Alderaan          Human
## 66     brown       66.0          male   Concord Dawn          Human
## 67    yellow         NA        female          Zolan       Clawdite
## 68    yellow         NA          male           Ojom       Besalisk
## 69     black         NA          male         Kamino       Kaminoan
## 70     black         NA        female         Kamino       Kaminoan
## 71      blue         NA        female      Coruscant          Human
## 72   unknown         NA          male    Aleen Minor         Aleena
## 73 red, blue         NA        female           <NA>           <NA>
## 74   unknown         NA          male          Skako        Skakoan
## 75      gold         NA          male     Muunilinst           Muun
## 76     black         NA        female          Shili        Togruta
##                                                                                                                                        films
## 1                                            Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2                     Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3  Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4                                                               Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5                                            Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6                                                                                      Attack of the Clones, Revenge of the Sith, A New Hope
## 7                                                                                      Attack of the Clones, Revenge of the Sith, A New Hope
## 8                                                                                                                                 A New Hope
## 9                                                                                                                                 A New Hope
## 10                    Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 11                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 12                                                                                                           Revenge of the Sith, A New Hope
## 13                                           Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 14                                                                Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 15                                                                                                                                A New Hope
## 16                                                                                        The Phantom Menace, Return of the Jedi, A New Hope
## 17                                                                                   Return of the Jedi, The Empire Strikes Back, A New Hope
## 18                                                                                                                                A New Hope
## 19                                Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back
## 20                                Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back
## 21                                                                         Attack of the Clones, Return of the Jedi, The Empire Strikes Back
## 22                                                                                                                   The Empire Strikes Back
## 23                                                                                                                   The Empire Strikes Back
## 24                                                                                               Return of the Jedi, The Empire Strikes Back
## 25                                                                                                                   The Empire Strikes Back
## 26                                                                                                     Return of the Jedi, The Force Awakens
## 27                                                                                                                        Return of the Jedi
## 28                                                                                                                        Return of the Jedi
## 29                                                                                                                        Return of the Jedi
## 30                                                                                                                        Return of the Jedi
## 31                                                                                                                        The Phantom Menace
## 32                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 33                                                                                                                        The Phantom Menace
## 34                                                                                                  Attack of the Clones, The Phantom Menace
## 35                                                                                                                        The Phantom Menace
## 36                                                                                                                        The Phantom Menace
## 37                                                                                                                        The Phantom Menace
## 38                                                                                                  Attack of the Clones, The Phantom Menace
## 39                                                                                                                        The Phantom Menace
## 40                                                                                                                        The Phantom Menace
## 41                                                                                                  Attack of the Clones, The Phantom Menace
## 42                                                                                                                        The Phantom Menace
## 43                                                                                                                        Return of the Jedi
## 44                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 45                                                                                                                        The Phantom Menace
## 46                                                                                                                        The Phantom Menace
## 47                                                                                                                        The Phantom Menace
## 48                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 49                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 50                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 51                                                                                                   The Phantom Menace, Revenge of the Sith
## 52                                                                                                   The Phantom Menace, Revenge of the Sith
## 53                                                                                                   The Phantom Menace, Revenge of the Sith
## 54                                                                                                                        The Phantom Menace
## 55                                                                             Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 56                                                                                                  Attack of the Clones, The Phantom Menace
## 57                                                                                                                      Attack of the Clones
## 58                                                                                                                      Attack of the Clones
## 59                                                                                                                      Attack of the Clones
## 60                                                                                                 Attack of the Clones, Revenge of the Sith
## 61                                                                                                 Attack of the Clones, Revenge of the Sith
## 62                                                                                                                      Attack of the Clones
## 63                                                                                                                      Attack of the Clones
## 64                                                                                                 Attack of the Clones, Revenge of the Sith
## 65                                                                                                 Attack of the Clones, Revenge of the Sith
## 66                                                                                                                      Attack of the Clones
## 67                                                                                                                      Attack of the Clones
## 68                                                                                                                      Attack of the Clones
## 69                                                                                                                      Attack of the Clones
## 70                                                                                                                      Attack of the Clones
## 71                                                                                                                      Attack of the Clones
## 72                                                                                                                        The Phantom Menace
## 73                                                                                                 Attack of the Clones, Revenge of the Sith
## 74                                                                                                                      Attack of the Clones
## 75                                                                                                                      Attack of the Clones
## 76                                                                                                 Attack of the Clones, Revenge of the Sith
##                                vehicles
## 1    Snowspeeder, Imperial Speeder Bike
## 2                                      
## 3                                      
## 4                                      
## 5                 Imperial Speeder Bike
## 6                                      
## 7                                      
## 8                                      
## 9                                      
## 10                      Tribubble bongo
## 11 Zephyr-G swoop bike, XJ-6 airspeeder
## 12                                     
## 13                                AT-ST
## 14                                     
## 15                                     
## 16                                     
## 17                          Snowspeeder
## 18                                     
## 19                                     
## 20                                     
## 21                                     
## 22                                     
## 23                                     
## 24                                     
## 25                                     
## 26                                     
## 27                                     
## 28                                     
## 29                                     
## 30                                     
## 31                      Tribubble bongo
## 32                                     
## 33                                     
## 34                                     
## 35                                     
## 36                                     
## 37                                     
## 38                                     
## 39                                     
## 40                                     
## 41                                     
## 42                         Sith speeder
## 43                                     
## 44                                     
## 45                                     
## 46                                     
## 47                                     
## 48                                     
## 49                                     
## 50                                     
## 51                                     
## 52                                     
## 53                                     
## 54                                     
## 55                                     
## 56                                     
## 57                                     
## 58                                     
## 59                                     
## 60                                     
## 61                                     
## 62                                     
## 63                                     
## 64                     Flitknot speeder
## 65                                     
## 66                                     
## 67           Koro-2 Exodrive airspeeder
## 68                                     
## 69                                     
## 70                                     
## 71                                     
## 72                                     
## 73                                     
## 74                                     
## 75                                     
## 76                                     
##                                                                                                   starships
## 1                                                                                  X-wing, Imperial shuttle
## 2                                                                                                          
## 3                                                                                                          
## 4                                                                                           TIE Advanced x1
## 5                                                                                                          
## 6                                                                                                          
## 7                                                                                                          
## 8                                                                                                          
## 9                                                                                                    X-wing
## 10 Jedi starfighter, Trade Federation cruiser, Naboo star skiff, Jedi Interceptor, Belbullab-22 starfighter
## 11                                                Trade Federation cruiser, Jedi Interceptor, Naboo fighter
## 12                                                                                                         
## 13                                                                      Millennium Falcon, Imperial shuttle
## 14                                                                      Millennium Falcon, Imperial shuttle
## 15                                                                                                         
## 16                                                                                                         
## 17                                                                                                   X-wing
## 18                                                                                                   X-wing
## 19                                                                                                         
## 20                                                                                                         
## 21                                                                                                  Slave 1
## 22                                                                                                         
## 23                                                                                                         
## 24                                                                                        Millennium Falcon
## 25                                                                                                         
## 26                                                                                                         
## 27                                                                                                         
## 28                                                                                                   A-wing
## 29                                                                                                         
## 30                                                                                        Millennium Falcon
## 31                                                                                                         
## 32                                                                                                         
## 33                                                                                                         
## 34                                                                                                         
## 35                                                                                                         
## 36                                                                                                         
## 37                                                                                     Naboo Royal Starship
## 38                                                                                                         
## 39                                                                                                         
## 40                                                                                                         
## 41                                                                                                         
## 42                                                                                                 Scimitar
## 43                                                                                                         
## 44                                                                                                         
## 45                                                                                                         
## 46                                                                                                         
## 47                                                                                                         
## 48                                                                                                         
## 49                                                                                                         
## 50                                                                                                         
## 51                                                                                                         
## 52                                                                                                         
## 53                                                                                                         
## 54                                                                                                         
## 55                                                                                         Jedi starfighter
## 56                                                                                                         
## 57                                                                                            Naboo fighter
## 58                                                                                                         
## 59                                                                                                         
## 60                                                                                                         
## 61                                                                                                         
## 62                                                                                                         
## 63                                                                                                         
## 64                                                                                                         
## 65                                                                                                         
## 66                                                                                                         
## 67                                                                                                         
## 68                                                                                                         
## 69                                                                                                         
## 70                                                                                                         
## 71                                                                                                         
## 72                                                                                                         
## 73                                                                                                         
## 74                                                                                                         
## 75                                                                                                         
## 76                                                                                                         
##  [ reached 'max' / getOption("max.print") -- omitted 11 rows ]
# tehdään datasta tibble ja tulostetaan
sw_tb <- tibble::as_tibble(sw)
sw_tb
## # A tibble: 87 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke…    172    77 blond      fair       blue            19   male  
##  2 C-3PO    167    75 <NA>       gold       yellow         112   <NA>  
##  3 R2-D2     96    32 <NA>       white, bl… red             33   <NA>  
##  4 Dart…    202   136 none       white      yellow          41.9 male  
##  5 Leia…    150    49 brown      light      brown           19   female
##  6 Owen…    178   120 brown, gr… light      blue            52   male  
##  7 Beru…    165    75 brown      light      blue            47   female
##  8 R5-D4     97    32 <NA>       white, red red             NA   <NA>  
##  9 Bigg…    183    84 black      light      brown           24   male  
## 10 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
## # … with 77 more rows, and 5 more variables: homeworld <chr>,
## #   species <chr>, films <list>, vehicles <list>, starships <list>

Datan suodattaminen: Rivien/tapausten valitseminen (filtering)

# tehdään ensin sw-dastasta tibble
sw <- dplyr::starwars
# valitaan kaikki ruskeatukkaiset hahmot
sw[sw$hair_color == "brown",]
## # A tibble: 23 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 <NA>      NA    NA <NA>       <NA>       <NA>              NA <NA>  
##  2 <NA>      NA    NA <NA>       <NA>       <NA>              NA <NA>  
##  3 Leia…    150    49 brown      light      brown             19 female
##  4 Beru…    165    75 brown      light      blue              47 female
##  5 <NA>      NA    NA <NA>       <NA>       <NA>              NA <NA>  
##  6 Chew…    228   112 brown      unknown    blue             200 male  
##  7 Han …    180    80 brown      fair       brown             29 male  
##  8 <NA>      NA    NA <NA>       <NA>       <NA>              NA <NA>  
##  9 <NA>      NA    NA <NA>       <NA>       <NA>              NA <NA>  
## 10 Wedg…    170    77 brown      fair       hazel             21 male  
## # … with 13 more rows, and 5 more variables: homeworld <chr>,
## #   species <chr>, films <list>, vehicles <list>, starships <list>
dplyr::filter(sw, hair_color == "brown") # dplyr
## # A tibble: 18 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Leia…    150    49 brown      light      brown             19 female
##  2 Beru…    165    75 brown      light      blue              47 female
##  3 Chew…    228   112 brown      unknown    blue             200 male  
##  4 Han …    180    80 brown      fair       brown             29 male  
##  5 Wedg…    170    77 brown      fair       hazel             21 male  
##  6 Jek …    180   110 brown      fair       blue              NA male  
##  7 Arve…     NA    NA brown      fair       brown             NA male  
##  8 Wick…     88    20 brown      brown      brown              8 male  
##  9 Qui-…    193    89 brown      fair       blue              92 male  
## 10 Ric …    183    NA brown      fair       blue              NA male  
## 11 Cordé    157    NA brown      light      brown             NA female
## 12 Clie…    183    NA brown      fair       blue              82 male  
## 13 Dormé    165    NA brown      light      brown             NA female
## 14 Tarf…    234   136 brown      brown      blue              NA male  
## 15 Raym…    188    79 brown      light      brown             NA male  
## 16 Rey       NA    NA brown      light      hazel             NA female
## 17 Poe …     NA    NA brown      light      brown             NA male  
## 18 Padm…    165    45 brown      light      brown             46 female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# valitaan kaikki alle kaksimetriset, joilla siniset silmät
sw[sw$height < 200 & sw$eye_color == "blue",]
## # A tibble: 17 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke…    172  77   blond      fair       blue            19   male  
##  2 Owen…    178 120   brown, gr… light      blue            52   male  
##  3 Beru…    165  75   brown      light      blue            47   female
##  4 Anak…    188  84   blond      fair       blue            41.9 male  
##  5 Wilh…    180  NA   auburn, g… fair       blue            64   male  
##  6 Jek …    180 110   brown      fair       blue            NA   male  
##  7 Lobot    175  79   none       light      blue            37   male  
##  8 Mon …    150  NA   auburn     fair       blue            48   female
##  9 Qui-…    193  89   brown      fair       blue            92   male  
## 10 Fini…    170  NA   blond      fair       blue            91   male  
## 11 Ric …    183  NA   brown      fair       blue            NA   male  
## 12 Adi …    184  50   none       dark       blue            NA   female
## 13 Mas …    196  NA   none       blue       blue            NA   male  
## 14 Clie…    183  NA   brown      fair       blue            82   male  
## 15 Lumi…    170  56.2 black      yellow     blue            58   female
## 16 Barr…    166  50   black      yellow     blue            40   female
## 17 Joca…    167  NA   white      fair       blue            NA   female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
dplyr::filter(sw, height < 200, eye_color == "blue") # dplyr
## # A tibble: 17 x 13
##    name  height  mass hair_color skin_color eye_color birth_year gender
##    <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
##  1 Luke…    172  77   blond      fair       blue            19   male  
##  2 Owen…    178 120   brown, gr… light      blue            52   male  
##  3 Beru…    165  75   brown      light      blue            47   female
##  4 Anak…    188  84   blond      fair       blue            41.9 male  
##  5 Wilh…    180  NA   auburn, g… fair       blue            64   male  
##  6 Jek …    180 110   brown      fair       blue            NA   male  
##  7 Lobot    175  79   none       light      blue            37   male  
##  8 Mon …    150  NA   auburn     fair       blue            48   female
##  9 Qui-…    193  89   brown      fair       blue            92   male  
## 10 Fini…    170  NA   blond      fair       blue            91   male  
## 11 Ric …    183  NA   brown      fair       blue            NA   male  
## 12 Adi …    184  50   none       dark       blue            NA   female
## 13 Mas …    196  NA   none       blue       blue            NA   male  
## 14 Clie…    183  NA   brown      fair       blue            82   male  
## 15 Lumi…    170  56.2 black      yellow     blue            58   female
## 16 Barr…    166  50   black      yellow     blue            40   female
## 17 Joca…    167  NA   white      fair       blue            NA   female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# valitaan kaikki M:llä alkavat hahmot
sw[grepl("^M", sw$name),]
## # A tibble: 3 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Mon …    150    NA auburn     fair       blue              48 female
## 2 Mace…    188    84 none       dark       brown             72 male  
## 3 Mas …    196    NA none       blue       blue              NA male  
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
dplyr::filter(sw, grepl("^M", name)) # dplyr
## # A tibble: 3 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Mon …    150    NA auburn     fair       blue              48 female
## 2 Mace…    188    84 none       dark       brown             72 male  
## 3 Mas …    196    NA none       blue       blue              NA male  
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
# valitaan rivit väliltä 10-15
sw[10:15,]
## # A tibble: 6 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
## 2 Anak…    188    84 blond      fair       blue            41.9 male  
## 3 Wilh…    180    NA auburn, g… fair       blue            64   male  
## 4 Chew…    228   112 brown      unknown    blue           200   male  
## 5 Han …    180    80 brown      fair       brown           29   male  
## 6 Gree…    173    74 <NA>       green      black           44   male  
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>
dplyr::slice(sw, 10:15)
## # A tibble: 6 x 13
##   name  height  mass hair_color skin_color eye_color birth_year gender
##   <chr>  <int> <dbl> <chr>      <chr>      <chr>          <dbl> <chr> 
## 1 Obi-…    182    77 auburn, w… fair       blue-gray       57   male  
## 2 Anak…    188    84 blond      fair       blue            41.9 male  
## 3 Wilh…    180    NA auburn, g… fair       blue            64   male  
## 4 Chew…    228   112 brown      unknown    blue           200   male  
## 5 Han …    180    80 brown      fair       brown           29   male  
## 6 Gree…    173    74 <NA>       green      black           44   male  
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## #   vehicles <list>, starships <list>

Datan suodattaminen: Sarakkeiden/muuttujien valitseminen

# valitaan muuttujat "name", "height" ja "mass"
sw[,c("name", "height","mass")]
## # A tibble: 87 x 3
##    name               height  mass
##    <chr>               <int> <dbl>
##  1 Luke Skywalker        172    77
##  2 C-3PO                 167    75
##  3 R2-D2                  96    32
##  4 Darth Vader           202   136
##  5 Leia Organa           150    49
##  6 Owen Lars             178   120
##  7 Beru Whitesun lars    165    75
##  8 R5-D4                  97    32
##  9 Biggs Darklighter     183    84
## 10 Obi-Wan Kenobi        182    77
## # … with 77 more rows
dplyr::select(sw, name, height, mass) # dplyr
## # A tibble: 87 x 3
##    name               height  mass
##    <chr>               <int> <dbl>
##  1 Luke Skywalker        172    77
##  2 C-3PO                 167    75
##  3 R2-D2                  96    32
##  4 Darth Vader           202   136
##  5 Leia Organa           150    49
##  6 Owen Lars             178   120
##  7 Beru Whitesun lars    165    75
##  8 R5-D4                  97    32
##  9 Biggs Darklighter     183    84
## 10 Obi-Wan Kenobi        182    77
## # … with 77 more rows
# valitaan kolme ensimmäistä muuttujaa
sw[,1:3]
## # A tibble: 87 x 3
##    name               height  mass
##    <chr>               <int> <dbl>
##  1 Luke Skywalker        172    77
##  2 C-3PO                 167    75
##  3 R2-D2                  96    32
##  4 Darth Vader           202   136
##  5 Leia Organa           150    49
##  6 Owen Lars             178   120
##  7 Beru Whitesun lars    165    75
##  8 R5-D4                  97    32
##  9 Biggs Darklighter     183    84
## 10 Obi-Wan Kenobi        182    77
## # … with 77 more rows
dplyr::select(sw, 1:3) # dplyr
## # A tibble: 87 x 3
##    name               height  mass
##    <chr>               <int> <dbl>
##  1 Luke Skywalker        172    77
##  2 C-3PO                 167    75
##  3 R2-D2                  96    32
##  4 Darth Vader           202   136
##  5 Leia Organa           150    49
##  6 Owen Lars             178   120
##  7 Beru Whitesun lars    165    75
##  8 R5-D4                  97    32
##  9 Biggs Darklighter     183    84
## 10 Obi-Wan Kenobi        182    77
## # … with 77 more rows
# valitaan ensimmäinen, neljäs ja kuudes muuttuja
sw[,c(1,4,6)]
## # A tibble: 87 x 3
##    name               hair_color    eye_color
##    <chr>              <chr>         <chr>    
##  1 Luke Skywalker     blond         blue     
##  2 C-3PO              <NA>          yellow   
##  3 R2-D2              <NA>          red      
##  4 Darth Vader        none          yellow   
##  5 Leia Organa        brown         brown    
##  6 Owen Lars          brown, grey   blue     
##  7 Beru Whitesun lars brown         blue     
##  8 R5-D4              <NA>          red      
##  9 Biggs Darklighter  black         brown    
## 10 Obi-Wan Kenobi     auburn, white blue-gray
## # … with 77 more rows
dplyr::select(sw, c(1,4,6)) # dplyr
## # A tibble: 87 x 3
##    name               hair_color    eye_color
##    <chr>              <chr>         <chr>    
##  1 Luke Skywalker     blond         blue     
##  2 C-3PO              <NA>          yellow   
##  3 R2-D2              <NA>          red      
##  4 Darth Vader        none          yellow   
##  5 Leia Organa        brown         brown    
##  6 Owen Lars          brown, grey   blue     
##  7 Beru Whitesun lars brown         blue     
##  8 R5-D4              <NA>          red      
##  9 Biggs Darklighter  black         brown    
## 10 Obi-Wan Kenobi     auburn, white blue-gray
## # … with 77 more rows

Datan suodattaminen: Sekä muuttujien että sarakkeiden valitseminen

# valitaan kaikki alle kaksimetriset hahmot, joilla siniset silmä JA sarakkeet name, eye_color ja mass
sw[sw$height < 200 & sw$eye_color == "blue",c("name", "eye_color","mass")]
## # A tibble: 17 x 3
##    name               eye_color  mass
##    <chr>              <chr>     <dbl>
##  1 Luke Skywalker     blue       77  
##  2 Owen Lars          blue      120  
##  3 Beru Whitesun lars blue       75  
##  4 Anakin Skywalker   blue       84  
##  5 Wilhuff Tarkin     blue       NA  
##  6 Jek Tono Porkins   blue      110  
##  7 Lobot              blue       79  
##  8 Mon Mothma         blue       NA  
##  9 Qui-Gon Jinn       blue       89  
## 10 Finis Valorum      blue       NA  
## 11 Ric Olié           blue       NA  
## 12 Adi Gallia         blue       50  
## 13 Mas Amedda         blue       NA  
## 14 Cliegg Lars        blue       NA  
## 15 Luminara Unduli    blue       56.2
## 16 Barriss Offee      blue       50  
## 17 Jocasta Nu         blue       NA
library(dplyr)
sw %>%
  dplyr::filter(height < 200, eye_color == "blue") %>%
  dplyr::select(name, eye_color, mass)
## # A tibble: 17 x 3
##    name               eye_color  mass
##    <chr>              <chr>     <dbl>
##  1 Luke Skywalker     blue       77  
##  2 Owen Lars          blue      120  
##  3 Beru Whitesun lars blue       75  
##  4 Anakin Skywalker   blue       84  
##  5 Wilhuff Tarkin     blue       NA  
##  6 Jek Tono Porkins   blue      110  
##  7 Lobot              blue       79  
##  8 Mon Mothma         blue       NA  
##  9 Qui-Gon Jinn       blue       89  
## 10 Finis Valorum      blue       NA  
## 11 Ric Olié           blue       NA  
## 12 Adi Gallia         blue       50  
## 13 Mas Amedda         blue       NA  
## 14 Cliegg Lars        blue       NA  
## 15 Luminara Unduli    blue       56.2
## 16 Barriss Offee      blue       50  
## 17 Jocasta Nu         blue       NA

Uusien muuttujien tekeminen laskemalla vanhoista

# lasketaan uusi muuttuja painoindeksi kullekin hahmolle
sw$bmi <- sw$mass / (sw$height/100)^2
# ja listataan vaan lievästi tai enemmän ylipainoiset
sw[sw$bmi >= 25,(c("name","bmi"))]
## # A tibble: 53 x 2
##    name                    bmi
##    <chr>                 <dbl>
##  1 Luke Skywalker         26.0
##  2 C-3PO                  26.9
##  3 R2-D2                  34.7
##  4 Darth Vader            33.3
##  5 Owen Lars              37.9
##  6 Beru Whitesun lars     27.5
##  7 R5-D4                  34.0
##  8 Biggs Darklighter      25.1
##  9 <NA>                   NA  
## 10 Jabba Desilijic Tiure 443. 
## # … with 43 more rows
library(dplyr)
sw %>%
  dplyr::mutate(bmi = mass / (height/100)^2) %>%
  dplyr::filter(bmi >= 25) %>% 
  select(name,bmi) %>% 
  arrange(desc(bmi))
## # A tibble: 25 x 2
##    name                    bmi
##    <chr>                 <dbl>
##  1 Jabba Desilijic Tiure 443. 
##  2 Dud Bolt               50.9
##  3 Yoda                   39.0
##  4 Owen Lars              37.9
##  5 IG-88                  35  
##  6 R2-D2                  34.7
##  7 Grievous               34.1
##  8 R5-D4                  34.0
##  9 Jek Tono Porkins       34.0
## 10 Darth Vader            33.3
## # … with 15 more rows

Luento 2 - Datan käsittelyn ja visualisoinnin perusteet

Tällä luennolla jatkamme datan datan käsittelyä ja aloitamme grafiikkaharjoitukset ´ggplot2`-paketilla. Starwars-datan ohella otamme käyttöön sivuston data-osiossa kuvatut aineistot.

Datan lukeminen/tuominen R:ään

Datan lukeminen levyltä on melko yksinkertaista, mutta katso oheisen webinaarin alku (0.00 - 14.00): Hadley Wickham - Getting your data into R. Lataa slaidit täältä.

Datan käsittely dplyr & tidyr -paketeilla

Garret Grolemundin Data wrangling with R and RStudio on hyvä johdanto datan käsittelyn perusteisiin sekä tidy-datan konseptiin. Slaidit!!

Visualisoinnin perusteet ggplot2-paketilla

Katso oheinen lyhyt video ggplot2:m perusteista. Jos tykkäät opetella asioita videoilta, katso joku perusteellisempi johdanto ggplot2-pakettiin samaan syssyyn.

Tee ja palauta harjoitustehtävä!


2017-2019 Markus Kainu.

Creative Commons -lisenssi
Tämä teos on lisensoitu Creative Commons Nimeä 4.0 Kansainvälinen -lisenssillä.