Click here to go back to the homepage.

Question 1:Examining Iris dataset structure

library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5     v purrr   0.3.4
## v tibble  3.1.6     v dplyr   1.0.7
## v tidyr   1.2.0     v stringr 1.4.0
## v readr   2.1.2     v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
glimpse(iris)
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.~
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.~
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.~
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.~
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~

The iris dataset has 150 observations and 5 datasets

Question 2: Create a new data frame iris1 that contains only the species virginica and versicolor with sepal lengths longer than 6 cm and sepal widths longer than 2.5 cm

iris1 <- iris %>%
  filter(Species == "virginica" | Species == "versicolor") %>%
  filter(Sepal.Length > 6 & Sepal.Width > 2.5)
glimpse(iris1)
## Rows: 56
## Columns: 5
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.~
## $ Sepal.Width  <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.~
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.0, 4.7, 4.3, 4.~
## $ Petal.Width  <dbl> 1.4, 1.5, 1.5, 1.5, 1.6, 1.3, 1.4, 1.4, 1.3, 1.2, 1.3, 1.~
## $ Species      <fct> versicolor, versicolor, versicolor, versicolor, versicolo~

Iris1 has 56 observations and 5 variables

Question 3: Now, create a iris2 data frame from iris1 that contains only the columns for Species, Sepal.Length, and Sepal.Width. How many observations and variables are in the dataset?

iris2 <- select(iris1, c(Species, Sepal.Length, Sepal.Width))
glimpse(iris2)
## Rows: 56
## Columns: 3
## $ Species      <fct> versicolor, versicolor, versicolor, versicolor, versicolo~
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.1, 6.1, 6.4, 6.~
## $ Sepal.Width  <dbl> 3.2, 3.2, 3.1, 2.8, 3.3, 2.9, 2.9, 3.1, 2.8, 2.8, 2.9, 3.~

Iris2 still has 56 observations and now just the 3 designated variables

Question 4: Create an iris3 data frame from iris2 that orders the observations from largest to smallest sepal length. Show the first 6 rows of this dataset.

iris3 <- arrange(iris2, by = desc(Sepal.Length))
head(iris3)
##     Species Sepal.Length Sepal.Width
## 1 virginica          7.9         3.8
## 2 virginica          7.7         3.8
## 3 virginica          7.7         2.6
## 4 virginica          7.7         2.8
## 5 virginica          7.7         3.0
## 6 virginica          7.6         3.0

Question 5:Create an iris4 data frame from iris3 that creates a column with a sepal area (length * width) value for each observation. How many observations and variables are in the dataset?

iris4 <- mutate(iris3, Sepal_Area = Sepal.Length*Sepal.Width)
head(iris4)
##     Species Sepal.Length Sepal.Width Sepal_Area
## 1 virginica          7.9         3.8      30.02
## 2 virginica          7.7         3.8      29.26
## 3 virginica          7.7         2.6      20.02
## 4 virginica          7.7         2.8      21.56
## 5 virginica          7.7         3.0      23.10
## 6 virginica          7.6         3.0      22.80

Iris4 has 56 observations and 4 variables

Question 6: Create iris5 that calculates the average sepal length, the average sepal width, and the sample size of the entire iris4 data frame and print iris5.

iris5 <- iris4 %>%
  summarize(mean_sepal_length = mean(Sepal.Length), mean_sepal_width = mean(Sepal.Width), n = n())
  
print(iris5)
##   mean_sepal_length mean_sepal_width  n
## 1          6.698214         3.041071 56

Question 7: Finally, create iris6 that calculates the average sepal length, the average sepal width, and the sample size for each species of in the iris4 data frame and print iris6.

iris6 <- iris4 %>%
  group_by(Species) %>%
  summarize(mean_sepal_length = mean(Sepal.Length), mean_sepal_width = mean(Sepal.Width), n = n())

print(iris6)
## # A tibble: 2 x 4
##   Species    mean_sepal_length mean_sepal_width     n
##   <fct>                  <dbl>            <dbl> <int>
## 1 versicolor              6.48             2.99    17
## 2 virginica               6.79             3.06    39

Question 8: In these exercises, you have successively modified different versions of the data frame iris1 iris1 iris3 iris4 iris5 iris6. At each stage, the output data frame from one operation serves as the input fro the next. A more efficient way to do this is to use the pipe operator %>% from the tidyr package. See if you can rework all of your previous statements into an extended piping operation that uses iris as the input and generates iris6 as the output.

iris7 <- iris %>%
  filter(Species == "virginica" | Species == "versicolor") %>%
  filter(Sepal.Length > 6 & Sepal.Width > 2.5) %>%
  select(c(Species, Sepal.Length, Sepal.Width)) %>%
  group_by(Species) %>%
  summarize(mean_sepal_length = mean(Sepal.Length), mean_sepal_width = mean(Sepal.Width), n = n())

print(iris7)
## # A tibble: 2 x 4
##   Species    mean_sepal_length mean_sepal_width     n
##   <fct>                  <dbl>            <dbl> <int>
## 1 versicolor              6.48             2.99    17
## 2 virginica               6.79             3.06    39

Question 9: Create a ‘longer’ data frame with three columns named: Species, Measure, Value.

iris_long <- 
  iris %>%
  pivot_longer(cols = Sepal.Length:Petal.Width, names_to = "Measure", values_to = "Values")

iris_long
## # A tibble: 600 x 3
##    Species Measure      Values
##    <fct>   <chr>         <dbl>
##  1 setosa  Sepal.Length    5.1
##  2 setosa  Sepal.Width     3.5
##  3 setosa  Petal.Length    1.4
##  4 setosa  Petal.Width     0.2
##  5 setosa  Sepal.Length    4.9
##  6 setosa  Sepal.Width     3  
##  7 setosa  Petal.Length    1.4
##  8 setosa  Petal.Width     0.2
##  9 setosa  Sepal.Length    4.7
## 10 setosa  Sepal.Width     3.2
## # ... with 590 more rows