Managing File Formats

While there are many benefits to working with a range of software and hardware tools for writing, one of the complexities created is that you now need to deal with different formats of files. Even restricting yourself to Microsoft Word or Google Docs, however, will not save you from this issue in the long run, because you will encounter people who cannot read your file properly, or your software may update leaving your older files unreadable, or Google Docs may mangle a file sent to you from someone else using Word or some other tool. Better to learn to sort out and manage the many different file formats used by writing software.

General Overview

Learning to work with files is a key computer skill. However, it can be complex and hard to understand. For this reason, Windows & Mac computers, and Android & iOS devices have tried to simplify things. Many years ago, these computer systems displayed files as icons with file names like this:

MyFile.doc

That is, a file name (MyFile) and a file extension (.doc). The extension tells you what kind of file it is and therefore what programs can open it. A .doc file was a Microsoft Word file.

Now, files are displayed with only their name and an icon that gives a clue as to what kind of file it is.

Screenshot of file icons in the Apple Mac Finder.
Screenshot of file icons in the Apple Mac Finder.

However, this icon can sometimes be misleading or the system can become confused about which programs you want to open which files. If you install a new program, it may greedily become the new default for several file formats and start opening files that you would prefer to be opened in a different application. For this reason, it is useful to know how to view the file extension and change the default program that opens files of any format. Unfortunately, there are no settings for this on iOS (iPhones and iPads).

Screenshot of file icons showing file extensions in the Apple Mac Finder.
Screenshot of file icons showing file extensions in the Apple Mac Finder.

Because these settings are different for different versions of Windows, Android, and macOS, it’s best to search the web specifically for your version with the following words, replacing the name / version of the operating system at the end:

show file name extensions windows 10
show file name extensions mac mojave

Too often, unfortunately, the sites that rank high on a search like this are filled with weird pop ups and other annoyances. This guide for Windows 10 and Android and this guide for Mac seemed ok at the time of this writing.

File Formats

There are two important distinctions to understand when considering file formats.

The first is the distinction between binary and plain text files. A binary file is made of computer instructions that are unreadable by a person. For example, an image file is a binary file that contains instructions to the computer for how to display the colors in the right way to depict the image on the screen. Because they consist of computer code, only programs that know how to read the computer code can open them. For example, older Microsoft Word files (the .doc files) are binary files as well. If you try to read an older Word file in a different application, it will look like gibberish to you because that application doesn’t know how to read that computer code.

A plain text file is simply that–a file that only contains text. This means that many different applications can open these files because they only contain text. The most common plain text file is simply a txt file. However, this is where things get tricky. Just because a file is made of plain text doesn’t mean that you will be able to understand what is in it. For example, if someone sends you a plain text file with paragraphs written in Spanish and you only know English, then you cannot understand that material even though you can open the file and see its contents as the author intended. Similarly, not every application will display plain text files in the same way. Websites are built out of plain text files called HTML and CSS files. These files contain “markup,” which looks a bit like code but is still readable by humans. Compare this HTML markup with the follow binary code from an image and you will see the difference.

HTML code:

<meta property="site_name" content="Writing Workflow Guides">
<meta property="title" content="Generating Ideas, Plans, and First Drafts">
<meta property="url" content="https://kent.writingworkflows.com/guides/invention/">

Binary code:

ˇÿˇ·(ExifMM*bj(1$r2ñái¨ÿ
¸Ä'

Even though websites are created in plain text files, different web browsers display these files in different ways because they interpret the markup code differently. Newer Microsoft Word files (docx files) are like websites in that they are actually a collection of markup files (in the XML language) rather than a single file (change the docx at the end of the file to zip, then unzip the file and you will see the many XML files). Many different applications besides Word can open docx files, but they may interpret the markup differently than Word, thus you may see different formatting or other issues.

Screenshot of the many files contained in a Word docx once it is unzipped.
Screenshot of the many files contained in a Word docx once it is unzipped.

Plain text files seem to have many advantages: they can be opened by a variety of programs and are future-proof–that is, no upgrade to a software application will make it so that you suddenly can’t open your file. However, they also have downsides for writers. Because these files can literally only contain text, if you want to have formatting or images alongside your text, it will require a more complicated plain text format (like docx) that can only be opened by particular programs. Plain text files like txt cannot include any formatting or images.

The second distinction is between proprietary and open. A proprietary file format contains code that is secret and can typically only be opened by the software that created it. In other words, the ability to read and write that file is a trade secret of sorts. An open file format is the opposite. It may also contain code, but that code is “documented” or explained somewhere so that other people can build tools that can open or save those types of files. Years ago, Microsoft Word files (doc files) were a proprietary format and if you wanted to be able to keep opening your Word files, you needed to keep using Word. Now, they have transitioned to an open format (docx files) and many programs can open these files. However, some claim that it is open in name only and that the format is unnecessarily complex so that applications other than Word have difficulty opening these files and displaying them exactly as they look in Word.

In summary:

  • the plain text file format txt is future-proof, in that you will be able to open and read this file with any type of computer and many different applications well into the future. However, these files have limitations and do not allow formatting or images.
  • plain text and open file formats provide the most flexibility because you can typically open them with different programs, not just one
  • proprietary file formats sometimes offer unique features that can be valuable, but come with the risk that these files can no longer be opened with future versions of the program, or that the program may no longer run on future computers you own, or that others may not use the same software so you cannot share files with them.

Choosing Between File Formats

Some applications can save files in different formats and present you with a choice (like Microsoft Word). Others only write to a single format. This means you will need to choose whether you want to use the default file format or an alternative, and whether that will require using a different application or not.

To make this choice, you should consider the following:

Do I want to open or edit this file in 5 or 10 years?

Some files you know you will want to open down the road to read, others you know you’ll want to edit that file, or maybe use it as a template. Other times, the file is just a paper for a writing course and you’ll never want to see it again (although, you might be surprised what you’d like to review in 10 years when you’re looking back).

For files that you know you want to have full control over 10 or 20 years into the future, you should try as much as possible to use an open and plain file format like txt. If the limitations of that file mean you will lose significant features, you could at least save one version as txt and keep the full version in the original format and hope for the best. If the rtf format (rich text file) is available, this can also be a good choice, because rtf stands somewhere in between a plain txt and a fully-featured docx file. This format can store some text styling elements (like bold and italics) as well as images in JPG or PNG format.

If you know you just want to be able to open the file in the future to read it, a PDF can be a good choice. The PDF format is something of an outlier in the categories mentioned above–it is an open format, but something of a mix between plain text and binary. However, because one of the key selling points of the PDF format is that the file displays exactly the same in every case (making it functionally equivalent to a printed piece of paper), this makes it more sustainable than a Word file. The main limitation with PDF files is that they cannot be edited without special software.

For cloud file formats like a Google Doc, while it seems easy to predict that Google will continue to exist for another decade, there is no guarantee that Google Docs will continue to be a key part of their business or that these files will continue to work. For important files, best to save a local copy in a format that makes sense given the considerations described above.

Screenshot demonstrating saving a Google Doc as a PDF file.
Screenshot demonstrating saving a Google Doc as a PDF file.

Do I want to open this file in multiple apps or on other devices?

While the previous question deals with sustainability for the long term, this question is about flexibility in immediate use. Based on your intentions, you may choose to work with files in one format, then archive them in a more open and sustainable format when you are finished. For example, you could draft in Google Docs so that you can open the file on any device that supports it (which is nearly anything at this point) but then download a PDF or rtf once the writing is complete. Or as a periodic backup.

This guide, however, cannot give very specific advice about all the various ways you might want to view and edit files on different devices. There are too many different kinds of devices and applications to make that feasible. The best thing to do is simply experiment with the whole chain of apps and devices you plan to use before you get too deep into an important project just to make sure everything plays nicely together. Print a sample document from the device you plan to print from before you get started. Open the file created on your computer on the iPad you plan to use, and open it without WiFi to see if that affects anything (this is a known gotcha with Google Docs sometimes, where being disconnected from the internet can end up erasing text you wrote during that period).

Key tradeoffs or complications to explore:

  • If you add a lot of formatting to a Google Doc, when you download as a docx or PDF, does it look as you intended? May be better to write in Google Docs, then do the formatting in Word later.
  • Does the app have a version on every device you want to use? If not, can you open the file on these other devices in other apps? What features are lost or different in these cases?
  • If you open the same file in different apps, does this cause the computer to become confused as to which is the “default” app for that file format? Does the file retain all of its data and formatting when saved by different apps and opened in the original again?
  • Does the sync service cause any complications? For example, Scrivener files can be synced with a cloud service like Dropbox, but users must be sure to completely close the file when they are done working before opening it on another computer. Additionally, while Dropbox works well with Scrivener files, Google Drive and OneDrive do not, according to their documentation.

Do I want to send this file to others?

In some ways, the answer here is the same as the question above. However, the complications only increase because now the main issue is uncertainty: do you know what software and device the other person will use to open the file, or save it and send back to you? When working collaboratively, it can be best to have an explicit discussion about this to avoid complications down the road (this is why so many collaborative writing projects now at least start in Google Docs even if they end in a Word file or PDF).

What makes this especially tricky is that there are few universal practices we all share with computers, even though it can seem that way. For example, some folks use Word all the time, people they send their files to use Word, and it can seem to them as if everyone uses Word–that it is the default writing tool. But then they change schools or work contexts, and now no one uses Word and they all use Google Docs, or they don’t use word processors at all. They may even encounter people that choose not to use Word for political reasons because they believe proprietary applications like Word restrict their freedoms.

Therefore, the best course of action is to communicate in advance with anyone before sending them documents to review. In fact, better even to communicate with a reviewer before you even begin the project, so that you can align the software you work in with the expected final format. This doesn’t mean, of course, that if your reviewer wants a Word file that you have to work in Word, only that you should know in advance that the text will have to end up there and you can plan accordingly.

If you are somehow not able to communicate in this way, sending a txt file or an rtf file is likely the best bet. If necessary, your recipient can use a word processor they like and the text will be automatically converted to another format if they want to use features specific to their software (e.g., track changes in Word).

If it is crucial that the formatting of the document you’re sending remain intact, but you’re not able to ask the recipient what software they have available, PDF can seem like the safest bet. Yet, even with PDF there are gotchas. Unfortunately, without careful attention it is difficult to make an accessible PDF, that is, one that can be easily read with a screen reader or other assistive technology.

At this time there is no single file format that is the universal “best” for distributing documents. Distributing documents for the web is beyond the scope of this guide, but HTML files are close to a best single file type and can easily be converted to a range of other types (PDF, Word, etc.). Otherwise, you can consider using multiple formats to reach readers with different needs and situations. A principle to follow in this case is “progressive enhancement,” which means providing the material in a fully readable and complete way in the simplest format possible (e.g., txt or rtf) but providing more enhanced versions (e.g., PDF) as well in other formats for readers who are able to use them. This way readers with the most challenges can still experience the complete text, but other readers can view enhancements to the basic text. This approach contrasts with “graceful degradation,” where you start with the complex text and try to produce a more basic format. Too often that approach results in an incomplete text missing key elements only visible or present in the enhanced version.

Common Formats and Features

docx - Microsoft Word

The current default format for Microsoft Word files. This file type supports “what you see is what you get” text editing–what you see on the screen is what would be printed on the page. This format supports multiple fonts, page layout features, graphics, and images. The file format is ostensibly open, in that the instructions for reading and writing files of this type are not secret, but there is some disagreement about whether they are well-documented.

Because Microsoft Word is an evolving software application that has gone through file format changes in its past, this format may not be fully supported in future iterations of Word, but because it is an open format, it is likely that there will still be some way to open these files in the future.

Many applications can save or open a docx file, including Word of course, but also Google Docs, Apple Pages, LibreOffice, OpenOffice, Scrivener, Ulysses, and others. However, be aware that some formatting or layout issues may arise when opening files created in different applications or different devices. That is, creating a docx file in Word on a Mac and opening it with Word on a Windows computer will likely introduce no changes, but saving a complex docx file created on Pages on the Mac and then opening it with LibreOffice on Windows may result in some formatting or layout changes (or it may not!).

Saving a file as docx with Apple Pages. Saving a file as docx with LibreOffice Writer. Saving a file as docx with Scrivener.
This gallery depicts the way different applications can save files in the Microsoft Word docx format.

PDF

A PDF file is typically a “read-only” way of distributing documents that are somewhat equivalent to printed documents–they look the same on every screen, with page breaks and layout the same no matter the computer or printer used. While readers can add notes or highlights to PDFs with a wide range of software applications, they can only be edited with specialized, and often expensive, software programs (like the full version of Adobe Acrobat).

A PDF file may have a variety of features depending on how it was made. Some PDFs allow you to select the text in them to highlight, or to copy and paste elsewhere. Others do not. One reason for this is that a PDF file may simply be a bundle of image files from a scanner–that is, someone scanned a print document and produced a PDF. Just like you can’t copy and paste text out of a photograph of a sign, you can’t copy and paste text out of a scanned document without further processing. In order to make text available for highlighting or copying, a scanned document needs to undergo a process called “optical character recognition” (OCR). The full version of Adobe Acrobat has this feature, as do some scanning applications (even apps you can get on a phone to scan documents), as well as online services (but, as with most online services, most require fees because their free versions are not useful or are tricks to get you to click on ads). After a PDF has undergone OCR, it now has a “text layer,” that is, a plain text representation of the text in the image. OCR is not perfect, and you may notice garbled text in poor-quality images when copying and pasting (or in PDFs that were OCR-d many years ago with older technology). Using an OCR service, you may be able to convert a PDF to a Word file for editing, but if the PDF has complicated tables or multiple columns, your results may be mixed.

Many writing applications can save to PDF, but typically they cannot then open these files for further editing (such as Microsoft Word, Google Docs, Apple Pages, etc.).

txt - Plain Text

Plain text files are the most simple format and contain only text. They are very sustainable and will likely be able to be opened and edited by any computing device many decades into the future. These files cannot include images. Typically, these files contain only “plain” text, that is, text without any formatting elements (like italics, bold, different fonts, etc.), but see the note below for exceptions.

Typically, plain text files are saved with a txt file extension, but because this format is so widespread, there are many different kinds of extensions people use to indicate what “kind” of plain text is inside the file. For example, an HTML file (used to make websites) is a plain text file.

Many different types of applications can open and save plain text files. Word processors like Word or Pages can, but if you want to save a plain text file you will not be able to use any formatting or save images inside it. You can also use a text editor, which is an application like a word processor in that you can compose text within it, but without all the features that exist for more complex formats. Many text editors are specifically geared toward writing computer code.

Formatting in Plain Text: It is possible to include formatting elements in plain text, but these require using either a) a special application, or b) a special syntax called “markup”. What this means is that the formatting elements for the text need to be written into the file as instructions about how the text should be displayed. For example, here is the markup for displaying bold and italic text in a paragraph of HTML:

<p>This sentence demonstrates how to save formatting elements, like <strong>bolded text</strong> and <em>italicized text</em>, in a plain text file</p>.

That sentence would look like this in a web browser:

This sentence demonstrates how to save formatting elements, like bolded text and italicized text, in a plain text file.

Because markup is tedious to write and can make proofreading difficult, typically people prefer to use word processors that directly apply formatting elements to text. However, there are some markup syntaxes that are more readable. For example, the same sentence written in the markup syntax called “Markdown” would look like this:

This sentence demonstrates how to save formatting elements, like **bolded text** and *italicized text*, in a plain text file.

Markdown uses common symbols like asterisks and hash marks to indicate common formatting elements, so it is easy to produce and does not disrupt proofreading nearly as much as HTML. Furthermore, there are many applications that display Markdown plain text files in a “what you see is what you get” way, so that instead of seeing the asterisks you would see the bolded and italicized text directly. If you open the file in a different application, you would see those asterisks. In some ways, these applications provide the best of both worlds: “what you see is what you get” text display and editing, but a sustainable plain text file format that can be opened by a wide variety of applications long into the future. Popular Markdown applications include Typora (currently free for Mac and Windows), iA Writer (available on all platforms), and Ulysses (for Mac and iOS, described in the managing complex writing guide).

rtf - Rich Text

The rich text format exists in between plain text and more full-featured formats like docx. This format can be opened and saved by many programs, such as Word and Pages. It can also include many formatting elements not included in txt files, such as bold and italicized text, different fonts and font sizes, and images saved in JPG or PNG format.

The rich text format has many aspects, so as with the docx format described above, different applications may display the text formatting or layout with small differences. However, for the most part, you can be sure these files will open in a wide range of applications for a long time into the future.

odt - Open Document Format Text

The open document format is similar to docx, in that it supports “what you see is what you get” file editing and can include a wide range of formatting and layout options, as well as images. Because it is a fully open format, it should be sustainable into the future although this sustainability is dependent on continued development of applications that support it (because it supports so many more features than rtf or txt it requires more advanced programs to open and save).

The open document format covers a number of different office file types, including slideshows and spreadsheets. These all have different file extensions. The word processing file extension is odt.

These files can be opened and saved by a range of word processors, including Microsoft Word, OpenOffice Writer, LibreOffice Writer, Google Docs, and TextEdit (but not Apple Pages).

Back to the top of the page ▲