This feature applies to data from November 1, 2025 (UTC) onward. Data before that date may not follow the same threshold logic and should be interpreted accordingly.
This article explains how Butlr indicates data usability in API responses using the fields data_incomplete and null_percentage.
These indicators help distinguish between:
Valid occupancy values (including zero)
Intervals where data collection was incomplete
This allows dashboards, analytics systems, and integrations to correctly interpret sensor data.
Why These Fields Matter
In some time windows, the system may receive incomplete data from one or more sensors.
Without additional indicators, it can be difficult to distinguish between:
A space that is truly empty (
occupancy = 0)A space that shows empty (
occupancy = 0) but should not be used due to incomplete dataA space that shows occupancy (
occupancy = 0) but should not be used due to incomplete data
To address this, the Butlr API provides data quality indicators that help determine whether the returned value is based on complete data coverage.
Data Quality Fields
data_incomplete
A boolean field that indicates whether the data for a time window is complete.
| Value | Meaning |
|---|---|
false | Data coverage for the interval is complete |
true | Some underlying data was not collected during the interval |
If data_incomplete = true, the value may still be computed from available inputs, but the interval should not be used for long-term analytics or reporting.
null_percentage (0–1)
null_percentage is a diagnostic field and should be interpreted using a threshold of 0.30.
≤ 0.30: data still be considered usable and does not necessarily indicate poor quality
> 0.30: data quality is considered insufficient and the data should generally not be treated as reliable
null_percentage is a diagnostic field. For deciding whether data should be used, Butlr recommends treating data_incomplete as the primary signal.
How to Interpret Results
null_on_incomplete is a filter parameter that controls how the API represents intervals with incomplete underlying data coverage.| Scenario | Value | data_incomplete | Interpretation |
|---|---|---|---|
| Complete interval | > 0 | false | Value is reliable |
| Valid zero occupancy | 0 | false | Space was empty; value is reliable |
| Partial data coverage | 0 or > 0 | true | Some raw data missing, but value computed from available portion. |
| No data collected | null | true | No usable data in the window. |
Use Case 1 — Occupancy = 0 vs No Data
- Scenario
- Occupancy value = 0, and you need to know whether it is real or missing data.
- Validation Rule
- If
data_incomplete = false→ Valid occupancy zero. - If
data_incomplete = true→ Do not trust the value. - Do not rely on
null_percentagealone.
- If
- Expected Outcome
- Clear distinction between “empty space” and “no reporting.”
- Reduced false alarms in operational workflows.
Use Case 2 — Validating Data Before Using It in Charts
- Scenario
- You want to build a dashboard chart and ensure the data is reliable.
- Recommended Approach
- Query at 15-minute granularity.
- If
data_incomplete = true:- Exclude that 15-minute interval.
- After validation, aggregate to hourly/daily if needed.
- Expected Outcome
- Short degraded periods are visible at 15-minute resolution.
- Aggregated charts (hourly/daily) reflect only validated intervals.
- Occupancy = 0 with
data_incomplete = falsecan be trusted as a valid zero.
{
"window": {
"every": "15m",
"function": "max",
"timezone": "America/New_York"
},
"filter": {
"start": "2026-02-01T05:00:00Z",
"stop": "2026-02-13T05:00:00Z",
"null_on_incomplete": true,
"spaces": { "eq": ["space_XXXXXXXXX"] },
"measurements": ["traffic_floor_occupancy"],
"value": { "gte": 0 }
}
}Use Case 3 — Long-term Data Availability (e.g., Last 6 Months)
- Scenario
- You need to evaluate data reliability over a long period (e.g., 6 months).
- Recommended Approach
- Query data day-by-day.
- Use UTC boundaries.
- Maintain 15-minute - 1 day granularity.
- Exclude intervals where
data_incomplete = true. - Exclude known sleep-mode/offline hours.
- Aggregate after validation.
- Expected Outcome
- Localized degradation is preserved and not hidden by aggregation.
- SLA metrics reflect actual degraded intervals.
- Month-level summaries are built only from validated data.
import requests
from datetime import datetime, timedelta, timezone
import pandas as pd
API_URL = "<https://api.butlr.io/api/v3/reporting>"
API_TOKEN = "YOUR_API_TOKEN"
SPACE_ID = "space_....."
START_DATE = datetime(2025, 8, 1, tzinfo=timezone.utc)
END_DATE = datetime(2026, 2, 1, tzinfo=timezone.utc)
SLEEP_START_UTC = 0 # Example: exclude 00:00–05:00 UTC
SLEEP_END_UTC = 5
def query_day(day_start):
day_end = day_start + timedelta(days=1)
payload = {
"window": {
"every": "15m",
"function": "max",
"timezone": "UTC"
},
"filter": {
"start": day_start.isoformat(),
"stop": day_end.isoformat(),
"null_on_incomplete": True,
"spaces": { "eq": [SPACE_ID] },
"measurements": ["traffic_floor_occupancy"],
"value": { "gte": 0 }
}
}
headers = {
"Authorization": f"Bearer {API_TOKEN}",
"Content-Type": "application/json"
}
response = requests.post(API_URL, json=payload, headers=headers)
response.raise_for_status()
return response.json()
all_records = []
current = START_DATE
while current < END_DATE:
data = query_day(current)
all_records.extend(data)
current += timedelta(days=1)
# Convert to DataFrame
df = pd.DataFrame(all_records)
# --- Validation Step ---
# Convert time to datetime
df["time"] = pd.to_datetime(df["time"], utc=True)
# 1️⃣ Exclude incomplete intervals
df_valid = df[df["data_incomplete"] == False]
# 2️⃣ Exclude sleep-mode hours (example: 00:00–05:00 UTC)
df_valid = df_valid[
~df_valid["time"].dt.hour.between(SLEEP_START_UTC, SLEEP_END_UTC - 1)
]
# --- SLA Calculation Example ---
total_intervals = len(df)
valid_intervals = len(df_valid)
incomplete_intervals = len(df[df["data_incomplete"] == True])
sla_percentage = (valid_intervals / total_intervals) * 100
print(f"Total intervals: {total_intervals}")
print(f"Incomplete intervals: {incomplete_intervals}")
print(f"SLA Availability: {sla_percentage:.2f}%")
# --- Monthly Summary (after validation) ---
df_valid["month"] = df_valid["time"].dt.to_period("M")
monthly_summary = df_valid.groupby("month").agg({
"value": "mean"
})
print(monthly_summary)
Recommended Usage
When processing data returned by the Butlr API, follow the recommended evaluation order below:
Check
data_incompleteIf
data_incomplete = true, exclude the interval from analyticsIf
data_incomplete = false, treat the value as usable
Use
null_percentageonly as an additional diagnosticThis field is intended for debugging data quality.
This is the recommended usage pattern for building reliable charts, summaries, and integrations.
API Behavior and Constraints
Supported Query Windows
This feature is intended for 15-minute intervals or larger.
Requests using granularity smaller than 15 minutes may return output, but those results are not considered meaningful for analysis. Webhook delivery also has a minimum interval of 15 minutes.
Sleep Mode Behavior
During scheduled sleep periods:
Data may not be collected
null_percentageanddata_incompletemay vary depending on the query window
For long-term availability analysis, Butlr recommends excluding known sleep periods from the evaluation window.
Long Offline Periods
If a Hive or sensor remains offline for more than two consecutive weeks, it may be excluded from certain internal calculation paths used in data aggregation.
When this happens:
The device may no longer be included in
null_percentagecalculationsThe interval may no longer trigger
data_incompleteAs a result,
null_percentagemay appear as 0%, even if the device previously had a prolonged outage
Because of this behavior, a “clean” null_percentage does not always guarantee that no prolonged outages occurred, particularly if the outage exceeded the two-week threshold.
Placeholder Sensors
If a space contains unused placeholder sensors in the configuration:
They may appear as offline
This can inflate
null_percentageor triggerdata_incomplete
Recommended action:
Remove unused placeholder sensors from the configuration to ensure accurate data quality metrics.
Virtually Mirrored or Cloned Sensors
In rare cases, a sensor may be "cloned" or “mirrored” in configuration.
When this occurs:
The cloned/mirrored entry may appear offline
The physical device itself may still be operating normally
This situation does not negatively affect null_percentage or data_incomplete.
Scope of This Feature
These data quality indicators apply to:
Presence sensor occupancy data
Traffic sensor IN/OUT measurements
They do not apply to floor-level occupancy estimates generated by traffic sensors, which follows a separate calibration process.
Best Practice for Analytics
For the most reliable reporting:
Query data at 15-minute granularity
Exclude intervals where
data_incomplete = trueAggregate the remaining validated intervals into hourly, daily, or longer summaries
This preserves short gaps in data quality while keeping long-term reporting based on usable intervals.
Comments
Please sign in to leave a comment.