How to filter a JSON array in Python
Filtering data within JSON arrays is a common task when processing structured data in Python. Whether you're working with data from APIs, configuration files, or databases, you need effective ways to extract the specific information you need.
This guide explores different methods for filtering JSON arrays in Python, providing practical examples and explanations.
Filtering JSON Arrays with List Comprehension
List comprehensions provide a concise way to filter a JSON array after parsing it into a Python list.
import json
json_array = json.dumps(
[
{'name': 'Alice', 'salary': 100},
{'name': 'Bob', 'salary': 50},
{'name': 'Carl', 'salary': 75}
]
)
a_list = json.loads(json_array)
filtered_list = [
dictionary for dictionary in a_list
if dictionary['salary'] > 50
]
print(filtered_list)
# Output: [{'name': 'Alice', 'salary': 100}, {'name': 'Carl', 'salary': 75}]
Explanation:
- First, we use
json.dumps()
to create a JSON string, this step is for the example purpose, in a real scenario you would probably have a JSON string already or have loaded the string from a file. json.loads(json_array)
parses the JSON string into a native Python list of dictionaries.- The list comprehension
[dictionary for dictionary in a_list if dictionary['salary'] > 50]
iterates through eachdictionary
in thea_list
. - The condition
if dictionary['salary'] > 50
checks if thesalary
value is greater than 50 for each dictionary. - Only the dictionaries that satisfy this condition are included in the
filtered_list
.
This approach provides a very readable and Pythonic way to filter JSON data.
Filtering JSON Arrays Using a for
Loop
Alternatively, you can use a traditional for
loop for filtering. This method is useful for developers who prefer more explicit control over the filtering process.
import json
json_array = json.dumps(
[
{'name': 'Alice', 'salary': 100},
{'name': 'Bob', 'salary': 50},
{'name': 'Carl', 'salary': 75}
]
)
a_list = json.loads(json_array)
filtered_list = []
for dictionary in a_list:
if dictionary['salary'] > 50:
filtered_list.append(dictionary)
print(filtered_list)
# Output: [{'name': 'Alice', 'salary': 100}, {'name': 'Carl', 'salary': 75}]
Explanation:
- We parse the JSON string into a Python list using
json.loads()
. - An empty list
filtered_list
is created to store the filtered dictionaries. - The
for
loop iterates through eachdictionary
in thea_list
. - Inside the loop, the
if
conditiondictionary['salary'] > 50
checks if thesalary
is greater than 50. - If the condition is true, the dictionary is appended to the
filtered_list
.
This approach achieves the same filtering result as the list comprehension but is more explicit in terms of the steps involved.
Filtering JSON Arrays from Files
Often, JSON data is stored in files. To filter data from a JSON file, you need to load the file content using the json.load()
method.
import json
file_name = 'example.json' # Assuming 'example.json' exists
with open(file_name, 'r', encoding='utf-8') as f:
a_list = json.load(f)
print(a_list)
filtered_list = [
dictionary for dictionary in a_list
if dictionary['salary'] > 50
]
print(filtered_list)
# Output:
# [{'name': 'Alice', 'salary': 100}, {'name': 'Bob', 'salary': 50}, {'name': 'Carl', 'salary': 75}]
# [{'name': 'Alice', 'salary': 100}, {'name': 'Carl', 'salary': 75}]
Explanation:
- The code opens the file using a
with open(...)
statement, which ensures that the file is closed correctly. json.load(f)
parses the JSON data from the opened file to a Python list.- The remaining steps are the same as the list comprehension examples, where we filter the list of dictionaries based on the salary.
This allows you to filter JSON data directly from a file and save the results in a variable.
You'll need an example.json
file that contains a valid JSON array for this example to work:
[
{"name": "Alice", "salary": 100},
{"name": "Bob", "salary": 50},
{"name": "Carl", "salary": 75}
]
The json.load()
method expects a text file or a binary file containing a JSON document that implements a .read()
method.
Filtering JSON Arrays with the filter()
Function
The built-in filter()
function provides another way to filter data based on a given condition. It's particularly useful with lambda functions.
import json
json_array = json.dumps(
[
{'name': 'Alice', 'salary': 100},
{'name': 'Bob', 'salary': 50},
{'name': 'Carl', 'salary': 75}
]
)
a_list = json.loads(json_array)
filtered_list = list(
filter(
lambda dictionary: dictionary['salary'] > 50,
a_list
)
)
print(filtered_list)
# Output: [{'name': 'Alice', 'salary': 100}, {'name': 'Carl', 'salary': 75}]
Explanation:
- We load the JSON string into a Python list using
json.loads()
. - The
filter()
function takes two arguments: a function (in this case a lambda function) that serves as a condition and an iterable (a_list
). - The
lambda dictionary: dictionary['salary'] > 50
lambda function checks if the salary is greater than 50. - The
filter()
function returns an iterator containing all the items of the list that return True from the lambda function. - We use
list()
to convert this iterator into a new list,filtered_list
.
This method can be more concise than using a for
loop for straightforward filtering conditions.