Python Pandas: How to Fix "ValueError: Expected object or value with read_json()
"
When working with JSON data in Pandas, the pd.read_json()
function is your primary tool for converting JSON structures into DataFrames. However, a common stumbling block is the ValueError: Expected object or value
(or similar JSON decoding errors). This error almost invariably signals that the JSON string or file you're trying to parse is not correctly formatted according to strict JSON syntax rules.
This guide will dive deep into the common JSON formatting pitfalls that trigger this ValueError
, providing clear examples of invalid and valid JSON. Furthermore, we'll explore crucial pd.read_json()
parameters like orient
, lines
, and encoding
that help Pandas correctly interpret even valid, but differently structured, JSON data or files with specific encodings.
Understanding the Error: What "Expected object or value" Means
The ValueError: Expected object or value
(or similar messages like JSONDecodeError: Expecting property name enclosed in double quotes
) is raised by the underlying JSON parser when it encounters a syntax error in your JSON data. JSON (JavaScript Object Notation) has a strict syntax. The parser expects specific structures:
- An object (dictionary-like, enclosed in
{}
) with key-value pairs. - An array (list-like, enclosed in
[]
) of values. - A value (string, number, boolean
true
/false
,null
, object, or array).
If the parser finds something that violates these rules (e.g., a missing quote, an extra comma, an improperly structured list of objects), it doesn't know how to interpret the subsequent characters and throws an error because it was "expecting" a valid JSON component but didn't find one.
Common JSON Formatting Mistakes and Their Solutions
The vast majority of these errors stem from malformed JSON. Here are the most common culprits:
Rule 1: String Keys and String Values Must Use Double Quotes ("
)
In JSON, all string keys and all string values must be enclosed in double quotes. Single quotes are not allowed.
data.json
(Problematic - single quotes for 'name' key):
{
'name': ["Alice", "Bob", "Carl"],
"experience": [10, 13, 15],
"salary": [175.1, 180.2, 190.3]
}
Python code triggering error:
import pandas as pd
try:
# ⛔️ ValueError: Expected object or value (or similar JSONDecodeError)
df = pd.read_json('data_single_quotes.json') # Assuming the above content
print(df)
except ValueError as e:
print(f"Error: {e}")
Output:
Error: Expected object or value
data.json
(Corrected - all keys and string values double-quoted):
{
"name": ["Alice", "Bob", "Carl"],
"experience": [10, 13, 15],
"salary": [175.1, 180.2, 190.3]
}
Python code (now works):
import pandas as pd
# Assuming the corrected data.json
df_correct = pd.read_json('data_correct_quotes.json')
print("DataFrame from correctly quoted JSON:")
print(df_correct)
Output:
name experience salary
0 Alice 10 175.1
1 Bob 13 180.2
2 Carl 15 190.3
Rule 2: No Trailing Commas Allowed
Trailing commas (a comma after the last element in an object or array) are forbidden in JSON.
data.json
(Problematic - trailing comma after salary array):
{
"name": ["Alice", "Bob", "Carl"],
"experience": [10, 13, 15],
"salary": [175.1, 180.2, 190.3],
}
Solution: Remove the trailing comma.
data.json
(Corrected):
{
"name": ["Alice", "Bob", "Carl"],
"experience": [10, 13, 15],
"salary": [175.1, 180.2, 190.3]
}
Rule 3: Arrays of JSON Objects Must Be Enclosed in Square Brackets ([]
)
If you have a sequence of JSON objects that you intend to be an array (list), they must be enclosed in square brackets []
and separated by commas.
data.json
(Problematic - multiple objects not in an array):
{"name": "Alice", "salary": 100},
{"name": "Bob", "salary": 50},
{"name": "Carl", "salary": 75}
Solution: Wrap the objects in []
.
data.json
(Corrected - now a valid JSON array of objects):
[
{"name": "Alice", "salary": 100},
{"name": "Bob", "salary": 50},
{"name": "Carl", "salary": 75}
]
Python code (for the corrected array of objects):
import pandas as pd
# Assuming the corrected data.json with an array of objects
df_array_of_objects = pd.read_json('data_array.json')
print("DataFrame from JSON array of objects:")
print(df_array_of_objects)
Output:
DataFrame from JSON array of objects:
name salary
0 Alice 100
1 Bob 50
2 Carl 75
Handling Different JSON Structures with the orient
Parameter
Even if your JSON is syntactically valid, its structure might not match what pd.read_json()
expects by default. The orient
parameter tells Pandas how to interpret the JSON structure.
Default Orientation (Column-Oriented Dictionary)
By default (orient=None
or orient='columns'
), pd.read_json()
expects a JSON object where keys are column names and values are dictionaries mapping index labels to cell values, or arrays/lists of cell values. The example in 2.1 (corrected version) fits this.
Record-Oriented JSON (orient='records'
)
This expects a JSON array where each element is an object representing a row. This is very common.
data_records.json
:
[
{"COL1":"a","COL2":"b", "ID": 1},
{"COL1":"c","COL2":"d", "ID": 2}
]
Python code:
import pandas as pd
df_records = pd.read_json('data_records.json', orient='records')
print("DataFrame from records-oriented JSON:")
print(df_records)
Output:
DataFrame from records-oriented JSON:
COL1 COL2 ID
0 a b 1
1 c d 2
Split-Oriented JSON (orient='split'
)
This format explicitly defines columns
, index
, and data
.
data_split.json
:
{
"columns":["Product", "Price"],
"index":["Item1", "Item2"],
"data":[["Apple", 1.00], ["Banana", 0.50]]
}
Python code:
import pandas as pd
df_split = pd.read_json('data_split.json', orient='split')
print("DataFrame from split-oriented JSON:")
print(df_split)
Output:
DataFrame from split-oriented JSON:
Product Price
Item1 Apple 1.0
Item2 Banana 0.5
Other orient
Options
Pandas supports other orient
values like 'index'
(dictionary where keys are index labels, values are row objects), 'values'
(just an array of data), and 'table'
(for table schema). Refer to the Pandas documentation for details if your JSON matches these specific structures.
Reading JSON Lines Format (lines=True
)
JSON Lines (JSONL) format consists of multiple JSON objects, each on a new line. Each line is a complete, independent JSON object.
data_lines.jsonl
:
{"name": "Alice", "city": "New York"}
{"name": "Bob", "city": "London"}
{"name": "Charlie", "city": "Paris"}
To read this, use lines=True
:
import pandas as pd
df_lines = pd.read_json('data_lines.jsonl', lines=True)
print("DataFrame from JSON Lines format:")
print(df_lines)
Output:
DataFrame from JSON Lines format:
name city
0 Alice New York
1 Bob London
2 Charlie Paris
Addressing File Encoding Issues (e.g., encoding='utf-8-sig'
for BOM)
Sometimes, a JSON file might be valid but saved with a specific encoding that includes a Byte Order Mark (BOM), like UTF-8-BOM. The default encoding='utf-8'
in pd.read_json()
might struggle with this. If you suspect a BOM, try encoding='utf-8-sig'
.
import pandas as pd
file_path_with_bom = 'data_with_bom.json' # Assume this file has UTF-8-BOM
# (Content could be the same as any valid JSON example above, for example the data_split.json)
try:
# Attempt with default encoding (might fail or parse incorrectly if BOM is present)
# df_bom_issue = pd.read_json(file_path_with_bom)
# ✅ Try with utf-8-sig to handle BOM
df_bom_fixed = pd.read_json(file_path_with_bom, encoding='utf-8-sig')
print("DataFrame read with encoding='utf-8-sig':")
print(df_bom_fixed)
except FileNotFoundError:
print(f"File not found, skipping BOM example: {file_path_with_bom}")
except ValueError as e:
print(f"Still an error with BOM file, ensure JSON is valid: {e}")
Output:
DataFrame read with encoding='utf-8-sig':
columns index data
0 Product Item1 [Apple, 1.0]
1 Price Item2 [Banana, 0.5]
This is less about JSON syntax and more about file encoding nuances.
General Debugging Tip: Validate Your JSON Externally
If you're unsure whether your JSON is valid, use an online JSON validator or a linter in your code editor. These tools can quickly pinpoint syntax errors like missing quotes, misplaced commas, or incorrect bracketing. This can save a lot of time before you even try to parse it with Pandas.
Conclusion
The ValueError: Expected object or value
when using pandas.read_json()
is predominantly a signal of malformed JSON. The primary solution is to meticulously check your JSON data against standard syntax rules:
- Ensure all string keys and values use double quotes.
- Eliminate any trailing commas.
- Correctly structure arrays of objects using
[]
.
Once your JSON is syntactically valid, parameters like orient
(for different JSON object structures), lines=True
(for JSON Lines format), and encoding
(for file encoding issues like BOM) allow pd.read_json()
to correctly interpret and load a wide variety of JSON data into a Pandas DataFrame.