How to Read Files Character by Character in Python
While less common than reading line by line, reading a file character by character in Python is sometimes necessary, especially when dealing with specific file formats or parsing tasks.
This guide explains how to read a file one character at a time using a while
loop and file.read(1)
, and we'll discuss why using nested for
loops to achieve this isn't efficient or Pythonic.
Reading Character by Character with a while
Loop
The most direct way to read a file one character at a time is to use a while
loop and the file.read(1)
method:
with open('example.txt', 'r', encoding='utf-8') as file:
result = ''
while True:
char = file.read(1) # Read one character
if not char: # End of file (empty string)
print('Reached end of file')
break
print(char)
result += char # Append to string, you can also add it to a list.
print(result) # Prints the contents of the file
with open(...) as file:
: Opens the file in read mode ('r'
) with UTF-8 encoding (always specify encoding!). Thewith
statement automatically closes the file.while True:
: This creates an infinite loop that we'll break out of when we reach the end of the file.char = file.read(1)
: This is the key.file.read(1)
reads one character from the file. If the end of the file is reached, it returns an empty string (''
), notNone
.if not char:
: This is how we detect the end of the file. An empty string is "falsy," sonot char
isTrue
when we're at the end.break
: Exits thewhile
loop.result += char
: Appends the read character to theresult
string. You could also append to a list (my_list.append(char)
) if you prefer a list of characters.- The
file.read()
takes an integer that represents how many bytes it will read from the file and returns them.
Adding Characters to a List
If you want to store the characters in a list instead of a string:
with open('example.txt', 'r', encoding='utf-8') as file:
characters = []
while True:
char = file.read(1)
if not char:
print('Reached end of file')
break
characters.append(char) # Append to list
print(char)
print(characters) # Output: ['b', 'o', 'b', 'b', 'y', 'h', 'a', 'd', 'z', '.', 'c', 'o', 'm', '\n']
Why Nested for
Loops Are Inefficient for Character-by-Character Reading
You might see code like this:
with open('example.txt', 'r', encoding='utf-8') as file:
for line in file: # Reads line by line
for char in line: # Iterates through characters in the line
print(char)
- You should avoid using nested loops for reading a file.
- This is much more efficient and concise, because Python file objects are iterable by line.