Asentamalla uusia paketteja saat R:ään käyttöön uusia funktioita. Monissa paketeissa mukana tulee myös pieniä datoja funktioiden käytön harjoitteluun. Useimmat paketeista mahdollistavat jonkun laskennallisen operaation tekemisen, yhä useammat tarjoavat toiminnallisuuksia R:stä esim. tietokantojen (eurostat
) tai verkkoteknologioiden (leaflet
) rajapintoihin.
# Asenna keskitetystä pakettihallinnasta CRAN
install.packages("eurostat")
# Ota paketti käyttöön
library(eurostat)
# Asenna kehitysversiot github:sta devtools-paketin avulla
devtools::install_github("ropengov/eurostat")
Ladattujen pakettien uusia funktioita voi käyttää joko A) lataamalla paketin ja kutsumalla funktiota tai B) kutsumalla pakettia ja funktiota yhtä aikaa. Tämän kurssin materiaaleissa pyrin käyttämään aina vaihtoehtoa B, jotta opiskelijoille olisi selkeämpää milloin käytössä oleva funktio on ns. “ulkoisesta” paketista ja milloin taas ns. base-R
:stä
# tapa A
library(eurostat)
d <- get_eurostat(id = "tgs00026")
## Reading cache file /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
## Table tgs00026 read from cache file: /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
# tapa B
d <- eurostat::get_eurostat(id = "tgs00026")
## Reading cache file /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
## Table tgs00026 read from cache file: /tmp/RtmpIe5H92/eurostat/tgs00026_date_code_TF.rds
Paketit tulee asentaa uudelleen aina uuden R-version myötä (x.y (3.6) version kohdalla, ei x.y.z (3.5.3) kohdalla)
To understand computations in R, two slogans are helpful:
John Chambers Creator of the S programming language, and core member of the R programming language project.
- Everything that exists is an object.
- Everything that happens is a function call."
Eli R:n toiminnan ymmärtämisessä on tärkeää muistaa kaksi asiaa:
Everything that exists is an object
objekti <- objektin_arvo
objekti = objektin_arvo
100 * (0.05 + 0.05)
# [1] 10
sqrt(10+6)
# [1] 4
# numeerinen vektori jonka pituus on 1 ja arvo 100
sata <- 100
sata
# [1] 100
is(sata)
# [1] "numeric" "vector"
# character vektori jonka pituus on 1 ja arvo "sata"
sata_tekstina <- "sata"
sata_tekstina
# [1] "sata"
is(sata_tekstina)
# [1] "character" "vector" "data.frameRowLabels" "SuperClassMethod"
# Luodaan kaksi 7 elementin pituista vektoria, joista toinen on numeerinen ja toinen character
nimi <- c("Juhani","Tuomas","Aapo","Simeoni","Timo","Lauri","Eero")
ika <- c(25,23,23,22,20,20,17)
typeof(nimi)
# "character"
typeof(ika)
# "double"
length(ika)
# loogiset vektorit
looginen_vektori <- c(TRUE,FALSE,T,T,F,FALSE)
Everything that happens is a function call
vektori <- c(1,2,3,4,5,6,7,8)
ka <- mean(vektori)
ka
# [1] 4.5
is(mean)
# [1] "function" "OptionalFunction" "PossibleMethod"
sum(vektori)
# [1] 36
Koodia kirjoitetaan ensisijaisesti tietokoneelle, mutta sen pitää olla ymmärrettävää myös ihmisille. R-ympäristössä on erilaisia tapoja kirjoittaa komentoja, ja tällä kurssilla pyrin käyttämään paljon ns. ketjutusta (piping ie. pipeline).
R:ään tämän ketjutuksen tarjoaa magrittr-paketti. Pipe-operator tai putkioperaattori %>%
voidaan ajatella tarkoittavan then/sitten. Pikanäppäin on Ctrl + Shift + m
.
Alla on erilaisia tapoja kirjoittaa/jäsentää koodia/analyysiprosessia. Esimerkissä käsitellään dplyr::starwars
library(dplyr)
d <- dplyr::starwars
# 1) Yksi operaatio per rivi
d_subset <- select(d, name, height, mass, hair_color, skin_color, eye_color, homeworld, species)
d_non_humans <- filter(d_subset, species != "Human")
d_non_humans_punajasinisilmat <- filter(d_subset, eye_color %in% c("red","blue"))
d_non_humans_punajasinisilmat_by_height <- arrange(d_non_humans_punajasinisilmat, desc(height))
tulos1 <- slice(d_non_humans_punajasinisilmat_by_height, 1:5)
# + analyysin välivaiheet on saatavilla
# - paljon nimeämistä, joka työlästä
# 2) operaatiot sisäkkäin yhdellä rivillä
tulos2 <- slice(arrange(filter(filter(select(d, name, height, mass, hair_color, skin_color, eye_color, homeworld, species),species != "Human"), eye_color %in% c("red","blue")), desc(height)), 1:5)
# - edellyttää lukemista oikealta vasemmalle
# + ei tarvitse nimetä
# 3) Operaatiot ketjutettuina
tulos3 <- d %>%
select(name, height, mass, hair_color, skin_color, eye_color, homeworld, species) %>%
filter(species != "Human") %>%
filter(eye_color %in% c("red","blue")) %>%
arrange(desc(height)) %>%
slice(1:5)
# + ei tarvitse nimetä
# + vasemmalta oikealle
# - ei pääsyä välivaiheisiin
Tiedostojärjestelmätoiminnot [fs](http://fs.r-lib.org/)
-paketilla
library(magrittr)
# luo kansio "aineisto" nykyiseen työhakenmistoon
dir.create(path = "./aineisto")
fs::dir_create(path = "./aineisto")
# luo uusi tekstitiedosto "teksti.txt" nykyiseen työhakemistoon ja avaa se muokattavaksi
file.create("./teksti.txt")
file.edit("./teksti.txt")
fs::file_create("./teksti.txt") %>% file.edit()
# listaa työhakemistossa ja sen alahakemistoissa olevat tiedostot, joilla pääte ".R"
list.files(path = "./", all.files = TRUE, full.names = TRUE, recursive = TRUE, pattern = ".R$")
fs::dir_ls(path = "./", all = TRUE, type = "file", recursive = TRUE, glob = "*.R")
# listaa kaikki työhakemiston alahakemistot ja niiden alahakemistot
list.dirs(path = "./", recursive = TRUE, full.names = TRUE)
fs::dir_ls(path = "./", all = TRUE, type = "directory", recursive = TRUE)
# kopioi tiedosto teksti.txt kansioon aineisto
file.copy(from = "./session1_perusteet.R", to = "./aineisto")
fs::file_copy(path = "./session1_perusteet.R", new_path = "./aineisto/session1_perusteet.R")
# poista kaikki tiedostot kansiosta aineisto, joilla pääte ".R"
file.remove(list.files(path = "./aineisto", pattern = ".R$"))
fs::file_delete(path = fs::dir_ls(path = "./", all = TRUE, type = "file", recursive = TRUE, glob = "*.R"))
# fs-paketin näppärät lisämuuttujat
fs::dir_info(path = "./", all = TRUE, recursive = TRUE, type = "file") %>%
arrange(desc(size))
# tiedostojen lataaminen verkosta levylle
download.file(url = "http://siteresources.worldbank.org/INTRES/Resources/469232-1107449512766/allginis_2013.xls",
# windowsissa muista
mode = "wb",
destfile = "./aineisto/allginis_2013.xls")
# pakattujen zip-tiedostojen lataaminen ja purkaminen
download.file(url = "http://fenixservices.fao.org/faostat/static/bulkdownloads/Food_Security_Data_E_All_Data_(Normalized).zip",
destfile = "./Food_Security_Data_E_All_Data_(Normalized).zip")
unzip("./Food_Security_Data_E_All_Data_(Normalized).zip", exdir = "./aineisto")
file.remove("./Food_Security_Data_E_All_Data_(Normalized).zip")
d <- read.csv("./aineisto/Food_Security_Data_E_All_Data_(Normalized).csv", stringsAsFactors = FALSE)
fs::dir_create("./aineisto_tmp/")
download.file("http://www.qogdata.pol.gu.se/data/qog_oecd_cs_jan19.csv", "./aineisto_tmp/qog_oecd_cs_jan19.csv", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.sav", "./aineisto_tmp/it04ip.sav", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.sas7bdat", "./aineisto_tmp/it04ip.sas7bdat", mode = "wb")
download.file("http://www.lisdatacenter.org/wp-content/uploads/it04ip.dta", "./aineisto_tmp/it04ip.dta", mode = "wb")
csv <- readr::read_csv("./aineisto_tmp/qog_oecd_cs_jan19.csv")
spss <- haven::read_sav("./aineisto_tmp/it04ip.sav")
sas <- haven::read_sas("./aineisto_tmp/it04ip.sas7bdat")
stata <- haven::read_dta("./aineisto_tmp/it04ip.dta")
haven::write_dta(data = stata, path = "./aineisto_tmp/stata.dta")
haven::write_sas(data = stata, path = "./aineisto_tmp/stata.sas7bdat")
haven::write_sav(data = stata, path = "./aineisto_tmp/stata.sav")
fs::dir_delete(path = "./aineisto_tmp")
library(dplyr)
library(nycflights13)
con <- DBI::dbConnect(RSQLite::SQLite(), path = ":memory:")
copy_to(con, nycflights13::flights, "flights",
temporary = FALSE,
indexes = list(
c("year", "month", "day"),
"carrier",
"tailnum",
"dest"
)
)
# Haetaan flights taulu ns. näkymäksi (reference) tietokannassa! EI SIIS TUODA R:än muistiin!
flights_db <- tbl(con, "flights")
# Sama SQL-kielellä
# flights_db <- odbc::dbGetQuery(con, "SELECT * FROM flights")
# valitaan muuttujia näkymässä
flights_db %>% select(year:day, dep_delay, arr_delay)
## # Source: lazy query [?? x 5]
## # Database: sqlite 3.22.0 []
## year month day dep_delay arr_delay
## <int> <int> <int> <dbl> <dbl>
## 1 2013 1 1 2 11
## 2 2013 1 1 4 20
## 3 2013 1 1 2 33
## 4 2013 1 1 -1 -18
## 5 2013 1 1 -6 -25
## 6 2013 1 1 -4 12
## 7 2013 1 1 -5 19
## 8 2013 1 1 -3 -14
## 9 2013 1 1 -3 -8
## 10 2013 1 1 -2 8
## # … with more rows
# suodatetaan rivejä näkymässä
flights_db %>% filter(dep_delay > 240)
## # Source: lazy query [?? x 19]
## # Database: sqlite 3.22.0 []
## year month day dep_time sched_dep_time dep_delay arr_time
## <int> <int> <int> <int> <int> <dbl> <int>
## 1 2013 1 1 848 1835 853 1001
## 2 2013 1 1 1815 1325 290 2120
## 3 2013 1 1 1842 1422 260 1958
## 4 2013 1 1 2115 1700 255 2330
## 5 2013 1 1 2205 1720 285 46
## 6 2013 1 1 2343 1724 379 314
## 7 2013 1 2 1332 904 268 1616
## 8 2013 1 2 1412 838 334 1710
## 9 2013 1 2 1607 1030 337 2003
## 10 2013 1 2 2131 1512 379 2340
## # … with more rows, and 12 more variables: sched_arr_time <int>,
## # arr_delay <dbl>, carrier <chr>, flight <int>, tailnum <chr>,
## # origin <chr>, dest <chr>, air_time <dbl>, distance <dbl>, hour <dbl>,
## # minute <dbl>, time_hour <dbl>
# tehdään ryhmittäisiä yhteenvetoja näkymässä
flights_db %>%
group_by(dest) %>%
summarise(delay = mean(dep_time))
## # Source: lazy query [?? x 2]
## # Database: sqlite 3.22.0 []
## dest delay
## <chr> <dbl>
## 1 ABQ 2006.
## 2 ACK 1033.
## 3 ALB 1627.
## 4 ANC 1635.
## 5 ATL 1293.
## 6 AUS 1521.
## 7 AVL 1175.
## 8 BDL 1490.
## 9 BGR 1690.
## 10 BHM 1944.
## # … with more rows
# Tehdään uusi näkymä, jossa ryhmittäisiä yhteenvetoja ja suodatuksia
tailnum_delay_db <- flights_db %>%
group_by(tailnum) %>%
summarise(
delay = mean(arr_delay),
n = n()
) %>%
arrange(desc(delay)) %>%
filter(n > 100)
# Näytetään yo. käsittelyn SQL-koodi
tailnum_delay_db %>% show_query()
## <SQL>
## SELECT *
## FROM (SELECT *
## FROM (SELECT `tailnum`, AVG(`arr_delay`) AS `delay`, COUNT() AS `n`
## FROM `flights`
## GROUP BY `tailnum`)
## ORDER BY `delay` DESC)
## WHERE (`n` > 100.0)
# Haetaan datan R:än muistiin tibble-objektiksi
tailnum_delay <- tailnum_delay_db %>% collect()
pxweb
eli Tilastokeskus, verottaja yms.install.packages(pxweb)
dat <- pxweb::pxweb_interactive()
CKAN
-alustalla julkaistun datan käyttö (avoindata.fi, mm. Väestörekisterikeskus, tulevaisuudessa myös Kela)# install.packages(ckanr)
library(ckanr)
ckanr_setup(url = "https://www.avoindata.fi/data/fi/")
x <- package_search(q = "Väestörekisterikeskus", fq = "title:nimi")
resources <- x$results[[1]]$resources
download.file(resources[[1]]$url, "file.xlsx", mode = "wb")
readxl::excel_sheets("file.xlsx")
## [1] "Miehet kaikki" "Miehet ens" "Miehet muut" "Naiset kaikki"
## [5] "Naiset ens" "Naiset muut" "Saate"
dat <- readxl::read_xlsx("file.xlsx", sheet = "Miehet kaikki") # data
head(dat)
## # A tibble: 6 x 2
## Etunimi Lukumäärä
## <chr> <dbl>
## 1 Juhani 288179
## 2 Olavi 144642
## 3 Antero 139907
## 4 Tapani 136887
## 5 Johannes 135708
## 6 Tapio 116507
Eurostat
library(eurostat)
res <- search_eurostat("Gross domestic")
dat <- eurostat::get_eurostat("nama_10r_2gdp")
## Reading cache file /tmp/RtmpIe5H92/eurostat/nama_10r_2gdp_date_code_TF.rds
## Table nama_10r_2gdp read from cache file: /tmp/RtmpIe5H92/eurostat/nama_10r_2gdp_date_code_TF.rds
head(dat)
## # A tibble: 6 x 4
## unit geo time values
## <fct> <fct> <date> <dbl>
## 1 EUR_HAB AT 2017-01-01 42100
## 2 EUR_HAB AT1 2017-01-01 41700
## 3 EUR_HAB AT11 2017-01-01 30000
## 4 EUR_HAB AT12 2017-01-01 34400
## 5 EUR_HAB AT13 2017-01-01 50000
## 6 EUR_HAB AT2 2017-01-01 37500
Maailmanpankki
library(WDI)
WDIsearch('gdp')[1:10,]
## indicator
## [1,] "SH.XPD.TOTL.ZS"
## [2,] "SH.XPD.PUBL.ZS"
## [3,] "SH.XPD.PRIV.ZS"
## [4,] "SH.XPD.KHEX.GD.ZS"
## [5,] "SH.XPD.GHED.GD.ZS"
## [6,] "SH.XPD.CHEX.GD.ZS"
## [7,] "UIS.XGDP.23.FSGOV"
## [8,] "UIS.XGDP.1.FSGOV.FDINSTADM.FFD"
## [9,] "UIS.XGDP.1.FSGOV"
## [10,] "UIS.XGDP.0.FSGOV.FDINSTADM.FFD"
## name
## [1,] "Health expenditure, total (% of GDP)"
## [2,] "Health expenditure, public (% of GDP)"
## [3,] "Health expenditure, private (% of GDP)"
## [4,] "Capital health expenditure (% of GDP)"
## [5,] "Domestic general government health expenditure (% of GDP)"
## [6,] "Current health expenditure (% of GDP)"
## [7,] "Government expenditure on secondary education as % of GDP (%)"
## [8,] "Government expenditure in primary institutions as % of GDP (%)"
## [9,] "Government expenditure on primary education as % of GDP (%)"
## [10,] "Government expenditure in pre-primary institutions as % of GDP (%)"
dat <- WDI(indicator='NY.GDP.PCAP.KD', country=c('MX','CA','US'), start=1960, end=2012)
head(dat)
## iso2c country NY.GDP.PCAP.KD year
## 1 CA Canada 48724.25 2012
## 2 CA Canada 48456.96 2011
## 3 CA Canada 47447.48 2010
## 4 CA Canada 46543.79 2009
## 5 CA Canada 48510.57 2008
## 6 CA Canada 48552.70 2007
OECD
library(OECD)
search_dataset("unemployment", data = dataset_list)
dataset_list <- get_datasets()
search_dataset("gdp", data = dataset_list)
# Katso ohjeita: https://github.com/expersso/OECD
ILO
Kurssin esimerkeissä tästä eteenpäin näytetään aina kaksi erilaista tapaa toteuttaa sama operaatio, 1) ns. base-R ratkaisu (ilman lisäpaketteja) ja 2) dplyr ratkaisu. Ratkaisut ovat aina peräkkäin ja dplyr
-toteutus on aina merkitty eksplisiittisesti dplyr::funktio_x()
.
Kurssillä pyrimme käyttämään ainoastaan ns. data.frame
luokkaan kuuluvia objekteja. Teknisesti ajateltuna R:ssä data.frame
on vektoreista koostuva lista. Vektorit voivat olla numeerisia, tekstiä tai faktoreita, mutta niiden tulee olla saman pituisia (ks. 7-veljestä demo).
Käytetään dataa starwars
paketista dplyr
.
sw <- as.data.frame(dplyr::starwars) # tehdään aluksi normaaliksi data.frameksi
nrow(sw) # rivien määrä
## [1] 87
ncol(sw) # sarakkeiden/muuttujien määrä
## [1] 13
dim(sw) # molemmat
## [1] 87 13
# Kuusi ensimmäistä riviä
head(sw) # tai
## name height mass hair_color skin_color eye_color birth_year
## 1 Luke Skywalker 172 77 blond fair blue 19.0
## 2 C-3PO 167 75 <NA> gold yellow 112.0
## 3 R2-D2 96 32 <NA> white, blue red 33.0
## 4 Darth Vader 202 136 none white yellow 41.9
## 5 Leia Organa 150 49 brown light brown 19.0
## 6 Owen Lars 178 120 brown, grey light blue 52.0
## gender homeworld species
## 1 male Tatooine Human
## 2 <NA> Tatooine Droid
## 3 <NA> Naboo Droid
## 4 male Tatooine Human
## 5 female Alderaan Human
## 6 male Tatooine Human
## films
## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6 Attack of the Clones, Revenge of the Sith, A New Hope
## vehicles starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2
## 3
## 4 TIE Advanced x1
## 5 Imperial Speeder Bike
## 6
sw[1:6,] # tai
## name height mass hair_color skin_color eye_color birth_year
## 1 Luke Skywalker 172 77 blond fair blue 19.0
## 2 C-3PO 167 75 <NA> gold yellow 112.0
## 3 R2-D2 96 32 <NA> white, blue red 33.0
## 4 Darth Vader 202 136 none white yellow 41.9
## 5 Leia Organa 150 49 brown light brown 19.0
## 6 Owen Lars 178 120 brown, grey light blue 52.0
## gender homeworld species
## 1 male Tatooine Human
## 2 <NA> Tatooine Droid
## 3 <NA> Naboo Droid
## 4 male Tatooine Human
## 5 female Alderaan Human
## 6 male Tatooine Human
## films
## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6 Attack of the Clones, Revenge of the Sith, A New Hope
## vehicles starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2
## 3
## 4 TIE Advanced x1
## 5 Imperial Speeder Bike
## 6
# dplyr-tapa
dplyr::slice(sw, 1:6)
## name height mass hair_color skin_color eye_color birth_year
## 1 Luke Skywalker 172 77 blond fair blue 19.0
## 2 C-3PO 167 75 <NA> gold yellow 112.0
## 3 R2-D2 96 32 <NA> white, blue red 33.0
## 4 Darth Vader 202 136 none white yellow 41.9
## 5 Leia Organa 150 49 brown light brown 19.0
## 6 Owen Lars 178 120 brown, grey light blue 52.0
## gender homeworld species
## 1 male Tatooine Human
## 2 <NA> Tatooine Droid
## 3 <NA> Naboo Droid
## 4 male Tatooine Human
## 5 female Alderaan Human
## 6 male Tatooine Human
## films
## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6 Attack of the Clones, Revenge of the Sith, A New Hope
## vehicles starships
## 1 Snowspeeder, Imperial Speeder Bike X-wing, Imperial shuttle
## 2
## 3
## 4 TIE Advanced x1
## 5 Imperial Speeder Bike
## 6
# vektorin indeksit
v1 <- 10:20
v1[1:4]
## [1] 10 11 12 13
# data.frame
sw[[1,2]]
## [1] 172
data.frame
vs. tibble
R:llä on jo ikää ja tavanomaisen data.frame
:n oheen on kehitetty vastaavia hieman modernimpia luokkia, kuten data.table
ja tibble
. data.table
on saman nimisen paketin luokka vastaavalle rakenteelle, jolle tehdyt metodit ovat nopeita. Luokka eroaa data.frame
luokasta sen verran, että perusmetodit eivät aina toimi. Mm. siitä syystä tällä kurssilla käytämme rinnakkain datoja luokissa tibble
ja data.frame
.
stringsAsFactors = FALSE
oletuksena# tulostetaan data sellaisenaan
sw
## name height mass hair_color skin_color
## 1 Luke Skywalker 172 77.0 blond fair
## 2 C-3PO 167 75.0 <NA> gold
## 3 R2-D2 96 32.0 <NA> white, blue
## 4 Darth Vader 202 136.0 none white
## 5 Leia Organa 150 49.0 brown light
## 6 Owen Lars 178 120.0 brown, grey light
## 7 Beru Whitesun lars 165 75.0 brown light
## 8 R5-D4 97 32.0 <NA> white, red
## 9 Biggs Darklighter 183 84.0 black light
## 10 Obi-Wan Kenobi 182 77.0 auburn, white fair
## 11 Anakin Skywalker 188 84.0 blond fair
## 12 Wilhuff Tarkin 180 NA auburn, grey fair
## 13 Chewbacca 228 112.0 brown unknown
## 14 Han Solo 180 80.0 brown fair
## 15 Greedo 173 74.0 <NA> green
## 16 Jabba Desilijic Tiure 175 1358.0 <NA> green-tan, brown
## 17 Wedge Antilles 170 77.0 brown fair
## 18 Jek Tono Porkins 180 110.0 brown fair
## 19 Yoda 66 17.0 white green
## 20 Palpatine 170 75.0 grey pale
## 21 Boba Fett 183 78.2 black fair
## 22 IG-88 200 140.0 none metal
## 23 Bossk 190 113.0 none green
## 24 Lando Calrissian 177 79.0 black dark
## 25 Lobot 175 79.0 none light
## 26 Ackbar 180 83.0 none brown mottle
## 27 Mon Mothma 150 NA auburn fair
## 28 Arvel Crynyd NA NA brown fair
## 29 Wicket Systri Warrick 88 20.0 brown brown
## 30 Nien Nunb 160 68.0 none grey
## 31 Qui-Gon Jinn 193 89.0 brown fair
## 32 Nute Gunray 191 90.0 none mottled green
## 33 Finis Valorum 170 NA blond fair
## 34 Jar Jar Binks 196 66.0 none orange
## 35 Roos Tarpals 224 82.0 none grey
## 36 Rugor Nass 206 NA none green
## 37 Ric Olié 183 NA brown fair
## 38 Watto 137 NA black blue, grey
## 39 Sebulba 112 40.0 none grey, red
## 40 Quarsh Panaka 183 NA black dark
## 41 Shmi Skywalker 163 NA black fair
## 42 Darth Maul 175 80.0 none red
## 43 Bib Fortuna 180 NA none pale
## 44 Ayla Secura 178 55.0 none blue
## 45 Dud Bolt 94 45.0 none blue, grey
## 46 Gasgano 122 NA none white, blue
## 47 Ben Quadinaros 163 65.0 none grey, green, yellow
## 48 Mace Windu 188 84.0 none dark
## 49 Ki-Adi-Mundi 198 82.0 white pale
## 50 Kit Fisto 196 87.0 none green
## 51 Eeth Koth 171 NA black brown
## 52 Adi Gallia 184 50.0 none dark
## 53 Saesee Tiin 188 NA none pale
## 54 Yarael Poof 264 NA none white
## 55 Plo Koon 188 80.0 none orange
## 56 Mas Amedda 196 NA none blue
## 57 Gregar Typho 185 85.0 black dark
## 58 Cordé 157 NA brown light
## 59 Cliegg Lars 183 NA brown fair
## 60 Poggle the Lesser 183 80.0 none green
## 61 Luminara Unduli 170 56.2 black yellow
## 62 Barriss Offee 166 50.0 black yellow
## 63 Dormé 165 NA brown light
## 64 Dooku 193 80.0 white fair
## 65 Bail Prestor Organa 191 NA black tan
## 66 Jango Fett 183 79.0 black tan
## 67 Zam Wesell 168 55.0 blonde fair, green, yellow
## 68 Dexter Jettster 198 102.0 none brown
## 69 Lama Su 229 88.0 none grey
## 70 Taun We 213 NA none grey
## 71 Jocasta Nu 167 NA white fair
## 72 Ratts Tyerell 79 15.0 none grey, blue
## 73 R4-P17 96 NA none silver, red
## 74 Wat Tambor 193 48.0 none green, grey
## 75 San Hill 191 NA none grey
## 76 Shaak Ti 178 57.0 none red, blue, white
## eye_color birth_year gender homeworld species
## 1 blue 19.0 male Tatooine Human
## 2 yellow 112.0 <NA> Tatooine Droid
## 3 red 33.0 <NA> Naboo Droid
## 4 yellow 41.9 male Tatooine Human
## 5 brown 19.0 female Alderaan Human
## 6 blue 52.0 male Tatooine Human
## 7 blue 47.0 female Tatooine Human
## 8 red NA <NA> Tatooine Droid
## 9 brown 24.0 male Tatooine Human
## 10 blue-gray 57.0 male Stewjon Human
## 11 blue 41.9 male Tatooine Human
## 12 blue 64.0 male Eriadu Human
## 13 blue 200.0 male Kashyyyk Wookiee
## 14 brown 29.0 male Corellia Human
## 15 black 44.0 male Rodia Rodian
## 16 orange 600.0 hermaphrodite Nal Hutta Hutt
## 17 hazel 21.0 male Corellia Human
## 18 blue NA male Bestine IV Human
## 19 brown 896.0 male <NA> Yoda's species
## 20 yellow 82.0 male Naboo Human
## 21 brown 31.5 male Kamino Human
## 22 red 15.0 none <NA> Droid
## 23 red 53.0 male Trandosha Trandoshan
## 24 brown 31.0 male Socorro Human
## 25 blue 37.0 male Bespin Human
## 26 orange 41.0 male Mon Cala Mon Calamari
## 27 blue 48.0 female Chandrila Human
## 28 brown NA male <NA> Human
## 29 brown 8.0 male Endor Ewok
## 30 black NA male Sullust Sullustan
## 31 blue 92.0 male <NA> Human
## 32 red NA male Cato Neimoidia Neimodian
## 33 blue 91.0 male Coruscant Human
## 34 orange 52.0 male Naboo Gungan
## 35 orange NA male Naboo Gungan
## 36 orange NA male Naboo Gungan
## 37 blue NA male Naboo <NA>
## 38 yellow NA male Toydaria Toydarian
## 39 orange NA male Malastare Dug
## 40 brown 62.0 male Naboo <NA>
## 41 brown 72.0 female Tatooine Human
## 42 yellow 54.0 male Dathomir Zabrak
## 43 pink NA male Ryloth Twi'lek
## 44 hazel 48.0 female Ryloth Twi'lek
## 45 yellow NA male Vulpter Vulptereen
## 46 black NA male Troiken Xexto
## 47 orange NA male Tund Toong
## 48 brown 72.0 male Haruun Kal Human
## 49 yellow 92.0 male Cerea Cerean
## 50 black NA male Glee Anselm Nautolan
## 51 brown NA male Iridonia Zabrak
## 52 blue NA female Coruscant Tholothian
## 53 orange NA male Iktotch Iktotchi
## 54 yellow NA male Quermia Quermian
## 55 black 22.0 male Dorin Kel Dor
## 56 blue NA male Champala Chagrian
## 57 brown NA male Naboo Human
## 58 brown NA female Naboo Human
## 59 blue 82.0 male Tatooine Human
## 60 yellow NA male Geonosis Geonosian
## 61 blue 58.0 female Mirial Mirialan
## 62 blue 40.0 female Mirial Mirialan
## 63 brown NA female Naboo Human
## 64 brown 102.0 male Serenno Human
## 65 brown 67.0 male Alderaan Human
## 66 brown 66.0 male Concord Dawn Human
## 67 yellow NA female Zolan Clawdite
## 68 yellow NA male Ojom Besalisk
## 69 black NA male Kamino Kaminoan
## 70 black NA female Kamino Kaminoan
## 71 blue NA female Coruscant Human
## 72 unknown NA male Aleen Minor Aleena
## 73 red, blue NA female <NA> <NA>
## 74 unknown NA male Skako Skakoan
## 75 gold NA male Muunilinst Muun
## 76 black NA female Shili Togruta
## films
## 1 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 2 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 3 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 4 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 5 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 6 Attack of the Clones, Revenge of the Sith, A New Hope
## 7 Attack of the Clones, Revenge of the Sith, A New Hope
## 8 A New Hope
## 9 A New Hope
## 10 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope
## 11 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 12 Revenge of the Sith, A New Hope
## 13 Revenge of the Sith, Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 14 Return of the Jedi, The Empire Strikes Back, A New Hope, The Force Awakens
## 15 A New Hope
## 16 The Phantom Menace, Return of the Jedi, A New Hope
## 17 Return of the Jedi, The Empire Strikes Back, A New Hope
## 18 A New Hope
## 19 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back
## 20 Attack of the Clones, The Phantom Menace, Revenge of the Sith, Return of the Jedi, The Empire Strikes Back
## 21 Attack of the Clones, Return of the Jedi, The Empire Strikes Back
## 22 The Empire Strikes Back
## 23 The Empire Strikes Back
## 24 Return of the Jedi, The Empire Strikes Back
## 25 The Empire Strikes Back
## 26 Return of the Jedi, The Force Awakens
## 27 Return of the Jedi
## 28 Return of the Jedi
## 29 Return of the Jedi
## 30 Return of the Jedi
## 31 The Phantom Menace
## 32 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 33 The Phantom Menace
## 34 Attack of the Clones, The Phantom Menace
## 35 The Phantom Menace
## 36 The Phantom Menace
## 37 The Phantom Menace
## 38 Attack of the Clones, The Phantom Menace
## 39 The Phantom Menace
## 40 The Phantom Menace
## 41 Attack of the Clones, The Phantom Menace
## 42 The Phantom Menace
## 43 Return of the Jedi
## 44 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 45 The Phantom Menace
## 46 The Phantom Menace
## 47 The Phantom Menace
## 48 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 49 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 50 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 51 The Phantom Menace, Revenge of the Sith
## 52 The Phantom Menace, Revenge of the Sith
## 53 The Phantom Menace, Revenge of the Sith
## 54 The Phantom Menace
## 55 Attack of the Clones, The Phantom Menace, Revenge of the Sith
## 56 Attack of the Clones, The Phantom Menace
## 57 Attack of the Clones
## 58 Attack of the Clones
## 59 Attack of the Clones
## 60 Attack of the Clones, Revenge of the Sith
## 61 Attack of the Clones, Revenge of the Sith
## 62 Attack of the Clones
## 63 Attack of the Clones
## 64 Attack of the Clones, Revenge of the Sith
## 65 Attack of the Clones, Revenge of the Sith
## 66 Attack of the Clones
## 67 Attack of the Clones
## 68 Attack of the Clones
## 69 Attack of the Clones
## 70 Attack of the Clones
## 71 Attack of the Clones
## 72 The Phantom Menace
## 73 Attack of the Clones, Revenge of the Sith
## 74 Attack of the Clones
## 75 Attack of the Clones
## 76 Attack of the Clones, Revenge of the Sith
## vehicles
## 1 Snowspeeder, Imperial Speeder Bike
## 2
## 3
## 4
## 5 Imperial Speeder Bike
## 6
## 7
## 8
## 9
## 10 Tribubble bongo
## 11 Zephyr-G swoop bike, XJ-6 airspeeder
## 12
## 13 AT-ST
## 14
## 15
## 16
## 17 Snowspeeder
## 18
## 19
## 20
## 21
## 22
## 23
## 24
## 25
## 26
## 27
## 28
## 29
## 30
## 31 Tribubble bongo
## 32
## 33
## 34
## 35
## 36
## 37
## 38
## 39
## 40
## 41
## 42 Sith speeder
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55
## 56
## 57
## 58
## 59
## 60
## 61
## 62
## 63
## 64 Flitknot speeder
## 65
## 66
## 67 Koro-2 Exodrive airspeeder
## 68
## 69
## 70
## 71
## 72
## 73
## 74
## 75
## 76
## starships
## 1 X-wing, Imperial shuttle
## 2
## 3
## 4 TIE Advanced x1
## 5
## 6
## 7
## 8
## 9 X-wing
## 10 Jedi starfighter, Trade Federation cruiser, Naboo star skiff, Jedi Interceptor, Belbullab-22 starfighter
## 11 Trade Federation cruiser, Jedi Interceptor, Naboo fighter
## 12
## 13 Millennium Falcon, Imperial shuttle
## 14 Millennium Falcon, Imperial shuttle
## 15
## 16
## 17 X-wing
## 18 X-wing
## 19
## 20
## 21 Slave 1
## 22
## 23
## 24 Millennium Falcon
## 25
## 26
## 27
## 28 A-wing
## 29
## 30 Millennium Falcon
## 31
## 32
## 33
## 34
## 35
## 36
## 37 Naboo Royal Starship
## 38
## 39
## 40
## 41
## 42 Scimitar
## 43
## 44
## 45
## 46
## 47
## 48
## 49
## 50
## 51
## 52
## 53
## 54
## 55 Jedi starfighter
## 56
## 57 Naboo fighter
## 58
## 59
## 60
## 61
## 62
## 63
## 64
## 65
## 66
## 67
## 68
## 69
## 70
## 71
## 72
## 73
## 74
## 75
## 76
## [ reached 'max' / getOption("max.print") -- omitted 11 rows ]
# tehdään datasta tibble ja tulostetaan
sw_tb <- tibble::as_tibble(sw)
sw_tb
## # A tibble: 87 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke… 172 77 blond fair blue 19 male
## 2 C-3PO 167 75 <NA> gold yellow 112 <NA>
## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA>
## 4 Dart… 202 136 none white yellow 41.9 male
## 5 Leia… 150 49 brown light brown 19 female
## 6 Owen… 178 120 brown, gr… light blue 52 male
## 7 Beru… 165 75 brown light blue 47 female
## 8 R5-D4 97 32 <NA> white, red red NA <NA>
## 9 Bigg… 183 84 black light brown 24 male
## 10 Obi-… 182 77 auburn, w… fair blue-gray 57 male
## # … with 77 more rows, and 5 more variables: homeworld <chr>,
## # species <chr>, films <list>, vehicles <list>, starships <list>
# tehdään ensin sw-dastasta tibble
sw <- dplyr::starwars
# valitaan kaikki ruskeatukkaiset hahmot
sw[sw$hair_color == "brown",]
## # A tibble: 23 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 <NA> NA NA <NA> <NA> <NA> NA <NA>
## 2 <NA> NA NA <NA> <NA> <NA> NA <NA>
## 3 Leia… 150 49 brown light brown 19 female
## 4 Beru… 165 75 brown light blue 47 female
## 5 <NA> NA NA <NA> <NA> <NA> NA <NA>
## 6 Chew… 228 112 brown unknown blue 200 male
## 7 Han … 180 80 brown fair brown 29 male
## 8 <NA> NA NA <NA> <NA> <NA> NA <NA>
## 9 <NA> NA NA <NA> <NA> <NA> NA <NA>
## 10 Wedg… 170 77 brown fair hazel 21 male
## # … with 13 more rows, and 5 more variables: homeworld <chr>,
## # species <chr>, films <list>, vehicles <list>, starships <list>
dplyr::filter(sw, hair_color == "brown") # dplyr
## # A tibble: 18 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Leia… 150 49 brown light brown 19 female
## 2 Beru… 165 75 brown light blue 47 female
## 3 Chew… 228 112 brown unknown blue 200 male
## 4 Han … 180 80 brown fair brown 29 male
## 5 Wedg… 170 77 brown fair hazel 21 male
## 6 Jek … 180 110 brown fair blue NA male
## 7 Arve… NA NA brown fair brown NA male
## 8 Wick… 88 20 brown brown brown 8 male
## 9 Qui-… 193 89 brown fair blue 92 male
## 10 Ric … 183 NA brown fair blue NA male
## 11 Cordé 157 NA brown light brown NA female
## 12 Clie… 183 NA brown fair blue 82 male
## 13 Dormé 165 NA brown light brown NA female
## 14 Tarf… 234 136 brown brown blue NA male
## 15 Raym… 188 79 brown light brown NA male
## 16 Rey NA NA brown light hazel NA female
## 17 Poe … NA NA brown light brown NA male
## 18 Padm… 165 45 brown light brown 46 female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# valitaan kaikki alle kaksimetriset, joilla siniset silmät
sw[sw$height < 200 & sw$eye_color == "blue",]
## # A tibble: 17 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke… 172 77 blond fair blue 19 male
## 2 Owen… 178 120 brown, gr… light blue 52 male
## 3 Beru… 165 75 brown light blue 47 female
## 4 Anak… 188 84 blond fair blue 41.9 male
## 5 Wilh… 180 NA auburn, g… fair blue 64 male
## 6 Jek … 180 110 brown fair blue NA male
## 7 Lobot 175 79 none light blue 37 male
## 8 Mon … 150 NA auburn fair blue 48 female
## 9 Qui-… 193 89 brown fair blue 92 male
## 10 Fini… 170 NA blond fair blue 91 male
## 11 Ric … 183 NA brown fair blue NA male
## 12 Adi … 184 50 none dark blue NA female
## 13 Mas … 196 NA none blue blue NA male
## 14 Clie… 183 NA brown fair blue 82 male
## 15 Lumi… 170 56.2 black yellow blue 58 female
## 16 Barr… 166 50 black yellow blue 40 female
## 17 Joca… 167 NA white fair blue NA female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
dplyr::filter(sw, height < 200, eye_color == "blue") # dplyr
## # A tibble: 17 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Luke… 172 77 blond fair blue 19 male
## 2 Owen… 178 120 brown, gr… light blue 52 male
## 3 Beru… 165 75 brown light blue 47 female
## 4 Anak… 188 84 blond fair blue 41.9 male
## 5 Wilh… 180 NA auburn, g… fair blue 64 male
## 6 Jek … 180 110 brown fair blue NA male
## 7 Lobot 175 79 none light blue 37 male
## 8 Mon … 150 NA auburn fair blue 48 female
## 9 Qui-… 193 89 brown fair blue 92 male
## 10 Fini… 170 NA blond fair blue 91 male
## 11 Ric … 183 NA brown fair blue NA male
## 12 Adi … 184 50 none dark blue NA female
## 13 Mas … 196 NA none blue blue NA male
## 14 Clie… 183 NA brown fair blue 82 male
## 15 Lumi… 170 56.2 black yellow blue 58 female
## 16 Barr… 166 50 black yellow blue 40 female
## 17 Joca… 167 NA white fair blue NA female
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# valitaan kaikki M:llä alkavat hahmot
sw[grepl("^M", sw$name),]
## # A tibble: 3 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Mon … 150 NA auburn fair blue 48 female
## 2 Mace… 188 84 none dark brown 72 male
## 3 Mas … 196 NA none blue blue NA male
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
dplyr::filter(sw, grepl("^M", name)) # dplyr
## # A tibble: 3 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Mon … 150 NA auburn fair blue 48 female
## 2 Mace… 188 84 none dark brown 72 male
## 3 Mas … 196 NA none blue blue NA male
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
# valitaan rivit väliltä 10-15
sw[10:15,]
## # A tibble: 6 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Obi-… 182 77 auburn, w… fair blue-gray 57 male
## 2 Anak… 188 84 blond fair blue 41.9 male
## 3 Wilh… 180 NA auburn, g… fair blue 64 male
## 4 Chew… 228 112 brown unknown blue 200 male
## 5 Han … 180 80 brown fair brown 29 male
## 6 Gree… 173 74 <NA> green black 44 male
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
dplyr::slice(sw, 10:15)
## # A tibble: 6 x 13
## name height mass hair_color skin_color eye_color birth_year gender
## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr>
## 1 Obi-… 182 77 auburn, w… fair blue-gray 57 male
## 2 Anak… 188 84 blond fair blue 41.9 male
## 3 Wilh… 180 NA auburn, g… fair blue 64 male
## 4 Chew… 228 112 brown unknown blue 200 male
## 5 Han … 180 80 brown fair brown 29 male
## 6 Gree… 173 74 <NA> green black 44 male
## # … with 5 more variables: homeworld <chr>, species <chr>, films <list>,
## # vehicles <list>, starships <list>
Datan suodattaminen: Sarakkeiden/muuttujien valitseminen
# valitaan muuttujat "name", "height" ja "mass"
sw[,c("name", "height","mass")]
## # A tibble: 87 x 3
## name height mass
## <chr> <int> <dbl>
## 1 Luke Skywalker 172 77
## 2 C-3PO 167 75
## 3 R2-D2 96 32
## 4 Darth Vader 202 136
## 5 Leia Organa 150 49
## 6 Owen Lars 178 120
## 7 Beru Whitesun lars 165 75
## 8 R5-D4 97 32
## 9 Biggs Darklighter 183 84
## 10 Obi-Wan Kenobi 182 77
## # … with 77 more rows
dplyr::select(sw, name, height, mass) # dplyr
## # A tibble: 87 x 3
## name height mass
## <chr> <int> <dbl>
## 1 Luke Skywalker 172 77
## 2 C-3PO 167 75
## 3 R2-D2 96 32
## 4 Darth Vader 202 136
## 5 Leia Organa 150 49
## 6 Owen Lars 178 120
## 7 Beru Whitesun lars 165 75
## 8 R5-D4 97 32
## 9 Biggs Darklighter 183 84
## 10 Obi-Wan Kenobi 182 77
## # … with 77 more rows
# valitaan kolme ensimmäistä muuttujaa
sw[,1:3]
## # A tibble: 87 x 3
## name height mass
## <chr> <int> <dbl>
## 1 Luke Skywalker 172 77
## 2 C-3PO 167 75
## 3 R2-D2 96 32
## 4 Darth Vader 202 136
## 5 Leia Organa 150 49
## 6 Owen Lars 178 120
## 7 Beru Whitesun lars 165 75
## 8 R5-D4 97 32
## 9 Biggs Darklighter 183 84
## 10 Obi-Wan Kenobi 182 77
## # … with 77 more rows
dplyr::select(sw, 1:3) # dplyr
## # A tibble: 87 x 3
## name height mass
## <chr> <int> <dbl>
## 1 Luke Skywalker 172 77
## 2 C-3PO 167 75
## 3 R2-D2 96 32
## 4 Darth Vader 202 136
## 5 Leia Organa 150 49
## 6 Owen Lars 178 120
## 7 Beru Whitesun lars 165 75
## 8 R5-D4 97 32
## 9 Biggs Darklighter 183 84
## 10 Obi-Wan Kenobi 182 77
## # … with 77 more rows
# valitaan ensimmäinen, neljäs ja kuudes muuttuja
sw[,c(1,4,6)]
## # A tibble: 87 x 3
## name hair_color eye_color
## <chr> <chr> <chr>
## 1 Luke Skywalker blond blue
## 2 C-3PO <NA> yellow
## 3 R2-D2 <NA> red
## 4 Darth Vader none yellow
## 5 Leia Organa brown brown
## 6 Owen Lars brown, grey blue
## 7 Beru Whitesun lars brown blue
## 8 R5-D4 <NA> red
## 9 Biggs Darklighter black brown
## 10 Obi-Wan Kenobi auburn, white blue-gray
## # … with 77 more rows
dplyr::select(sw, c(1,4,6)) # dplyr
## # A tibble: 87 x 3
## name hair_color eye_color
## <chr> <chr> <chr>
## 1 Luke Skywalker blond blue
## 2 C-3PO <NA> yellow
## 3 R2-D2 <NA> red
## 4 Darth Vader none yellow
## 5 Leia Organa brown brown
## 6 Owen Lars brown, grey blue
## 7 Beru Whitesun lars brown blue
## 8 R5-D4 <NA> red
## 9 Biggs Darklighter black brown
## 10 Obi-Wan Kenobi auburn, white blue-gray
## # … with 77 more rows
Datan suodattaminen: Sekä muuttujien että sarakkeiden valitseminen
# valitaan kaikki alle kaksimetriset hahmot, joilla siniset silmä JA sarakkeet name, eye_color ja mass
sw[sw$height < 200 & sw$eye_color == "blue",c("name", "eye_color","mass")]
## # A tibble: 17 x 3
## name eye_color mass
## <chr> <chr> <dbl>
## 1 Luke Skywalker blue 77
## 2 Owen Lars blue 120
## 3 Beru Whitesun lars blue 75
## 4 Anakin Skywalker blue 84
## 5 Wilhuff Tarkin blue NA
## 6 Jek Tono Porkins blue 110
## 7 Lobot blue 79
## 8 Mon Mothma blue NA
## 9 Qui-Gon Jinn blue 89
## 10 Finis Valorum blue NA
## 11 Ric Olié blue NA
## 12 Adi Gallia blue 50
## 13 Mas Amedda blue NA
## 14 Cliegg Lars blue NA
## 15 Luminara Unduli blue 56.2
## 16 Barriss Offee blue 50
## 17 Jocasta Nu blue NA
library(dplyr)
sw %>%
dplyr::filter(height < 200, eye_color == "blue") %>%
dplyr::select(name, eye_color, mass)
## # A tibble: 17 x 3
## name eye_color mass
## <chr> <chr> <dbl>
## 1 Luke Skywalker blue 77
## 2 Owen Lars blue 120
## 3 Beru Whitesun lars blue 75
## 4 Anakin Skywalker blue 84
## 5 Wilhuff Tarkin blue NA
## 6 Jek Tono Porkins blue 110
## 7 Lobot blue 79
## 8 Mon Mothma blue NA
## 9 Qui-Gon Jinn blue 89
## 10 Finis Valorum blue NA
## 11 Ric Olié blue NA
## 12 Adi Gallia blue 50
## 13 Mas Amedda blue NA
## 14 Cliegg Lars blue NA
## 15 Luminara Unduli blue 56.2
## 16 Barriss Offee blue 50
## 17 Jocasta Nu blue NA
Uusien muuttujien tekeminen laskemalla vanhoista
# lasketaan uusi muuttuja painoindeksi kullekin hahmolle
sw$bmi <- sw$mass / (sw$height/100)^2
# ja listataan vaan lievästi tai enemmän ylipainoiset
sw[sw$bmi >= 25,(c("name","bmi"))]
## # A tibble: 53 x 2
## name bmi
## <chr> <dbl>
## 1 Luke Skywalker 26.0
## 2 C-3PO 26.9
## 3 R2-D2 34.7
## 4 Darth Vader 33.3
## 5 Owen Lars 37.9
## 6 Beru Whitesun lars 27.5
## 7 R5-D4 34.0
## 8 Biggs Darklighter 25.1
## 9 <NA> NA
## 10 Jabba Desilijic Tiure 443.
## # … with 43 more rows
library(dplyr)
sw %>%
dplyr::mutate(bmi = mass / (height/100)^2) %>%
dplyr::filter(bmi >= 25) %>%
select(name,bmi) %>%
arrange(desc(bmi))
## # A tibble: 25 x 2
## name bmi
## <chr> <dbl>
## 1 Jabba Desilijic Tiure 443.
## 2 Dud Bolt 50.9
## 3 Yoda 39.0
## 4 Owen Lars 37.9
## 5 IG-88 35
## 6 R2-D2 34.7
## 7 Grievous 34.1
## 8 R5-D4 34.0
## 9 Jek Tono Porkins 34.0
## 10 Darth Vader 33.3
## # … with 15 more rows
Tällä luennolla jatkamme datan datan käsittelyä ja aloitamme grafiikkaharjoitukset ´ggplot2`-paketilla. Starwars-datan ohella otamme käyttöön sivuston data-osiossa kuvatut aineistot.
Datan lukeminen levyltä on melko yksinkertaista, mutta katso oheisen webinaarin alku (0.00 - 14.00): Hadley Wickham - Getting your data into R. Lataa slaidit täältä.
Garret Grolemundin Data wrangling with R and RStudio on hyvä johdanto datan käsittelyn perusteisiin sekä tidy-datan konseptiin. Slaidit!!
Katso oheinen lyhyt video ggplot2:m perusteista. Jos tykkäät opetella asioita videoilta, katso joku perusteellisempi johdanto ggplot2
-pakettiin samaan syssyyn.
Lataa datan visualisoinnin lunttilappu tästä: Data Visualization Cheat Sheet
2017-2019 Markus Kainu.
Tämä teos on lisensoitu Creative Commons Nimeä 4.0 Kansainvälinen -lisenssillä.