Python Pandas: Select DataFrame Rows Between Two Values (Numeric Range)
Filtering a Pandas DataFrame to select rows where values in a specific column fall within a certain numerical range (between two values) is a fundamental data selection task. This is crucial for tasks like isolating data points within specific thresholds, analyzing segments, or data cleaning.
This guide demonstrates several common and effective methods to select DataFrame rows based on a numerical range, primarily using Series.between()
, boolean indexing with AND (&
), and DataFrame.query()
.
The Goal: Filtering by a Numerical Range
Given a Pandas DataFrame and a specific column containing numerical data, we want to select only those rows where the value in that column lies between a lower bound and an upper bound. For example, selecting all employees with a salary between $50,000 and $70,000.
Example DataFrame
import pandas as pd
data = {
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Department': ['HR', 'Engineering', 'Sales', 'HR', 'Engineering', 'Sales'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000],
'YearsExperience': [3, 7, 5, 2, 8, 4]
}
df = pd.DataFrame(data)
print("Original DataFrame:")
print(df)
Output:
EmployeeID Name Department Salary YearsExperience
0 101 Alice HR 55000 3
1 102 Bob Engineering 85000 7
2 103 Charlie Sales 70000 5
3 104 David HR 48000 2
4 105 Eve Engineering 90000 8
5 106 Frank Sales 65000 4
We will filter based on the 'Salary'
column.
Method 1: Using Series.between()
(Recommended)
The Series.between(left, right, inclusive="both")
method is specifically designed for this task. It returns a boolean Series indicating whether each element of the calling Series lies between left
and right
.
Basic Usage (Inclusive Boundaries)
By default, between()
includes both the left
and right
boundary values in the range.
import pandas as pd
df_example = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000]
})
lower_bound = 50000
upper_bound = 70000
# Create a boolean mask using .between()
salary_in_range_mask = df_example['Salary'].between(lower_bound, upper_bound)
print("Boolean mask from .between():")
print(salary_in_range_mask)
print()
# ✅ Use the mask to filter the DataFrame
df_salary_between = df_example[salary_in_range_mask]
print(f"Rows where Salary is between {lower_bound} and {upper_bound} (inclusive):")
print(df_salary_between)
Output:
Boolean mask from .between():
0 True
1 False
2 True
3 False
4 False
5 True
Name: Salary, dtype: bool
Rows where Salary is between 50000 and 70000 (inclusive):
EmployeeID Name Salary
0 101 Alice 55000
2 103 Charlie 70000
5 106 Frank 65000
Controlling Inclusivity (inclusive
parameter)
The inclusive
parameter can take values:
"both"
(default):left <= series <= right
"neither"
:left < series < right
"left"
:left <= series < right
"right"
:left < series <= right
import pandas as pd
df_example = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000]
})
lower_bound = 50000
upper_bound = 70000
# Example: Salary strictly between 50000 and 70000 (exclusive)
salary_strictly_between_mask = df_example['Salary'].between(lower_bound, upper_bound, inclusive="neither")
df_salary_strictly_between = df_example[salary_strictly_between_mask]
print(f"Rows where Salary is strictly between {lower_bound} and {upper_bound}:")
print(df_salary_strictly_between)
Output (Employee 101 and 103 are excluded as their salaries are on the boundaries):
Rows where Salary is strictly between 50000 and 70000:
EmployeeID Name Salary
0 101 Alice 55000
5 106 Frank 65000
Method 2: Using Boolean Indexing with Logical AND (&
)
You can construct the range check manually using two separate comparisons combined with the logical AND operator (&
). Remember to enclose each individual comparison in parentheses due to operator precedence.
import pandas as pd
df_example = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000]
})
lower_bound = 50000
upper_bound = 70000
# ✅ Define two conditions and combine with &
condition_ge_lower = (df_example['Salary'] >= lower_bound)
condition_le_upper = (df_example['Salary'] <= upper_bound)
df_salary_between_and = df_example[condition_ge_lower & condition_le_upper]
print(f"Rows where Salary is between {lower_bound} and {upper_bound} (using AND):")
print(df_salary_between_and)
Output: (Same as .between() with inclusive="both")
Rows where Salary is between 50000 and 70000 (using AND):
EmployeeID Name Salary
0 101 Alice 55000
2 103 Charlie 70000
5 106 Frank 65000
This method is very explicit and flexible if you need to adjust the inclusivity (e.g., using >
instead of >=
).
Method 3: Using DataFrame.query()
The DataFrame.query()
method allows you to filter rows using a string expression. You can express "between" logic directly in the query string.
import pandas as pd
df_example = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000]
})
lower_bound_var = 50000 # Using variables in query with @
upper_bound_var = 70000
# ✅ Query using chained comparison (Python-like syntax)
query_string = f"@lower_bound_var <= Salary <= @upper_bound_var"
# Or, using 'and': "Salary >= @lower_bound_var and Salary <= @upper_bound_var"
df_salary_between_query = df_example.query(query_string)
print(f"Rows where Salary is between {lower_bound_var} and {upper_bound_var} (using query()):")
print(df_salary_between_query)
Output: (Same as .between() with inclusive="both")
Rows where Salary is between 50000 and 70000 (using query()):
EmployeeID Name Salary
0 101 Alice 55000
2 103 Charlie 70000
5 106 Frank 65000
query()
can be very readable for complex conditions.- You can reference local variables within the query string by prefixing them with
@
. - If column names have spaces, wrap them in backticks:
`Column Name With Spaces`
.
Selecting Rows NOT Between Two Values (Negation)
To get rows where the column value is outside the specified range, you can negate the boolean mask created by Series.between()
or your AND condition using the tilde (~
) operator.
import pandas as pd
df_example = pd.DataFrame({
'EmployeeID': [101, 102, 103, 104, 105, 106],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Frank'],
'Salary': [55000, 85000, 70000, 48000, 90000, 65000]
})
lower_bound = 50000
upper_bound = 70000
salary_in_range_mask = df_example['Salary'].between(lower_bound, upper_bound)
# ✅ Negate the mask using ~
df_salary_not_between = df_example[~salary_in_range_mask]
print(f"Rows where Salary is NOT between {lower_bound} and {upper_bound}:")
print(df_salary_not_between)
print()
# Using boolean logic for "not between" (salary < lower OR salary > upper)
df_salary_not_between_alt = df_example[(df_example['Salary'] < lower_bound) | (df_example['Salary'] > upper_bound)]
print("Rows where Salary is NOT between (using OR logic):")
print(df_salary_not_between_alt)
Output:
Rows where Salary is NOT between 50000 and 70000:
EmployeeID Name Salary
1 102 Bob 85000
3 104 David 48000
4 105 Eve 90000
Rows where Salary is NOT between (using OR logic):
EmployeeID Name Salary
1 102 Bob 85000
3 104 David 48000
4 105 Eve 90000
Choosing the Right Method
Series.between(left, right, inclusive=...)
: Highly recommended for its clarity, conciseness, and explicit control over boundary inclusivity. It's specifically designed for this task.- Boolean Indexing with
&
: Very flexible and explicit. Good ifbetween()
isn't available (older Pandas) or if your conditions are more complex than a simple range. DataFrame.query()
: Can be very readable, especially with chained comparisons. Useful if you prefer string-based querying or need to reference local variables with@
.
All three methods are effective. Series.between()
is often the most direct and idiomatic choice for range filtering.
Conclusion
Selecting DataFrame rows where a column's values fall between two numerical limits is a fundamental Pandas operation.
- The
Series.between()
method is the most specialized and often the cleanest way to define such a range, with options to control boundary inclusion. - Traditional boolean indexing with the
&
(AND) operator (df[(df['col'] >= L) & (df['col'] <= R)]
) provides full control and explicitness. DataFrame.query()
offers an alternative string-based syntax (df.query('L <= col <= R')
) that many find readable.- To select rows outside a range, negate the boolean condition using the tilde (
~
) operator.
By applying these techniques, you can effectively filter your Pandas DataFrames to isolate data within specific numerical ranges for focused analysis.