PRDV420: Tidyverse: Writing to a CSV File | Saylor Academy

readr also comes with two useful functions for writing data back to disk: write_csv() and write_tsv(). Both functions increase the chances of the output file being read back in correctly by:

Always encoding strings in UTF-8.
Saving dates and date-times in ISO8601 format so they are easily parsed elsewhere.

If you want to export a csv file to Excel, use write_excel_csv() - this writes a special character (a "byte order mark") at the start of the file which tells Excel that you're using the UTF-8 encoding.

The most important arguments are x (the data frame to save), and path (the location to save it). You can also specify how missing values are written with na, and if you want to append to an existing file.

write_csv(challenge, "challenge.csv")

Note that the type information is lost when you save to csv:

challenge
#> # A tibble: 2,000 x 2
#>       x y         
#>   <dbl> <date>    
#> 1   404 NA        
#> 2  4172 NA        
#> 3  3004 NA        
#> 4   787 NA        
#> 5    37 NA        
#> 6  2332 NA        
#> # … with 1,994 more rows
write_csv(challenge, "challenge-2.csv")
read_csv("challenge-2.csv")
#> 
#> ── Column specification ────────────────────────────────────────────────────────
#> cols(
#>   x = col_double(),
#>   y = col_logical()
#> )
#> # A tibble: 2,000 x 2
#>       x y    
#>   <dbl> <lgl>
#> 1   404 NA   
#> 2  4172 NA   
#> 3  3004 NA   
#> 4   787 NA   
#> 5    37 NA   
#> 6  2332 NA   
#> # … with 1,994 more rows

This makes CSVs a little unreliable for caching interim results - you need to recreate the column specification every time you load in. There are two alternatives:

write_rds() and read_rds() are uniform wrappers around the base functions readRDS() and saveRDS(). These store data in R's custom binary format called RDS:

write_rds(challenge, "challenge.rds")
read_rds("challenge.rds")
#> # A tibble: 2,000 x 2
#>       x y         
#>   <dbl> <date>    
#> 1   404 NA        
#> 2  4172 NA        
#> 3  3004 NA        
#> 4   787 NA        
#> 5    37 NA        
#> 6  2332 NA        
#> # … with 1,994 more rows

The feather package implements a fast binary file format that can be shared across programming languages:

library(feather)
write_feather(challenge, "challenge.feather")
read_feather("challenge.feather")
#> # A tibble: 2,000 x 2
#>       x      y
#>   <dbl> <date>
#> 1   404   <NA>
#> 2  4172   <NA>
#> 3  3004   <NA>
#> 4   787   <NA>
#> 5    37   <NA>
#> 6  2332   <NA>
#> # ... with 1,994 more rows

Feather tends to be faster than RDS and is usable outside of R. RDS supports list-columns (which you'll learn about in many models); feather currently does not.

Source: H. Wickham and G. Grolemund, https://r4ds.had.co.nz/data-import.html
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 3.0 License.

Last modified: Monday, January 9, 2023, 3:52 PM

Course Introduction

Course Syllabus

Unit 1: Introduction to R and RStudio

1.1: R and Coding Environments

Overview of R

Introduction to R and RStudio

1.2: Installing and Setting Up R and RStudio

Installing R and RStudio

Setting up RStudio

Updating Software

1.3: Command Line and Script

Using R as a Calculator

Practice: Calculator

1.4: Functions and Packages

Functions

Practice: Functions

Packages

Updating R and Its Packages

Practice: Functions and Packages

1.5: Management of Code and Other Files

R Projects and Files in a Project

Practice: R Projects

Best Practices for Writing R Code

Unit 1 Assessment

Unit 1 Assessment

Unit 2: Basic Object Types and Operations in R

2.1: Data Types

Basic Data Types and Data Structures in R

Practice: Data Types

Strings

Practice: Strings

Factors

Practice: Factors

2.2: Vectors

Vectors and Simple Manipulations

Vectors and Type Coercion

Practice: Vectors

2.3: Arrays and Matrices

What is the Difference Between Arrays and Matrices?

Arrays in R

Matrices in R

Practice: Arrays and Matrices

2.4: Lists and Data Frames

Lists and Data Frames

Practice: Base-R Lists and Data Frames

The Tibble Format

Practice: Tibbles

The data.table Format

Practice: Data Tables

Unit 2 Assessment

Unit 2 Assessment

Unit 3: Data Import and Export

3.1: Data Input via Keyboard or Number Generation

Entering Data

Data Sets in Base R

Practice: Built-in Datasets

Pseudo-Random Number Generation

Practice: Random Number Generation

Reproducible Simulations

3.2: Loading External Files

Data Loading and Viewing

Base R: Reading Plain-Text Files

Tidyverse: Reading Plain-Text Files

Practice: read_csv

Parsing a Vector

Practice: Parsing a Vector

Parsing a File

Using the readxl Package to Read Excel Files

Loading Files From Other Programs

3.3: Data Export and Reusing R Data

Saving and Reloading Data in R Format

Practice: Export and Reuse

Base R: Writing to a CSV File

Tidyverse: Writing to a CSV File

Practice: Export to a CSV File

Practice: Data Manipulation in a Project

Unit 3 Assessment

Unit 3 Assessment

Unit 4: Data Visualization

4.1: Base-R and ggplot2 Graphics