Project
More than just a step
Data Science · Data Visualization · Accelerometry · Harmonic Ratio
Overview
Smart watches provide step counts based on the accelerometers they contain. But they can be more than simple pedometers. Looking into the harmonic ratio during walking (or exercise) can lend insight into gait rhythmicity and symmetry. Its not just about how many steps, but what kind of steps.
Approach
This project was completed in Python, using packages such as:
scipy.signal
scipy.fft
matplotlib
There’s more information in a single step than most wearables know what to do with.
Mostly, they summarize these data as step counts. Yet the data can be analyzed to quantify other features of movement as well, such as its rhythmicity or symmetry. And these deeper insights can be used to understand how the gait of one person or group compares to another.
Harmonics are components of a signal (such as accelerations collected by a smartwatch or research-grade accelerometer) that occur at integer multiples of a base frequency. In walking, that base frequency is your step rate—known as the fundamental frequency. It represents the overall rhythm of your gait. Higher harmonics arise because human movement isn't perfectly smooth or uniform; they capture the finer, more complex oscillations within each step. These additional frequencies reflect characteristics like balance, coordination, and control. One commonly used summary metric is the harmonic ratio, which compares the energy in even and odd harmonics to quantify gait symmetry and stability. These signals can reveal subtle changes in movement that step counts alone can't detect.
I recently let a clinical cohort study that used harmonic ratio to understand gait differences in those with concussion vs. healthy controls using a tri-axial waist-worn accelerometer (the Actigraph GT3x) during a standardized sub-maximal aerobic exercise treadmill test. The publication is under review, and the data cannot be reproduced here (owing to privacy protections). What I have done is mirror the analysis pipeline with synthetic data. The approah and the overall narrative remain the same.
Let’s start with synthetic data generation at 30Hz.
# synthetic data settings
sampling_rate = 30 # Hz
duration_minutes = 10
samples_per_subject = sampling_rate * 60 * duration_minutes
timestamps = np.arange(0, samples_per_subject) / sampling_rate
num_concussion = 20
num_control = 20
output_dir = "synthetic_data"
os.makedirs(output_dir, exist_ok=True)
def generate_sensor_data(label):
# additional noise is added to the concussion data, as observed in "real life"
base_gravity = 9.8
noise_scale = 0.3 if label == "control" else 1.2
drift_factor = 0.0 if label == "control" else 0.002
tremor_amp = 0.0 if label == "control" else 0.4
time = timestamps
drift = drift_factor * np.arange(samples_per_subject)
# simulate each axis
def axis_data(base=0.0):
signal = np.random.normal(loc=base, scale=noise_scale, size=samples_per_subject)
if label == "concussion":
tremor = tremor_amp * np.sin(2 * np.pi * 3 * time)
signal += tremor + drift
return signal
data = {
"Timestamp": time,
"X": axis_data(),
"Y": axis_data(),
"Z": axis_data(base=base_gravity),
"label": [label] * samples_per_subject
}
return pd.DataFrame(data)
# generate synthetic data
for i in range(num_control):
df = generate_sensor_data("control")
df.to_csv(os.path.join(output_dir, f"control_{i+1:02d}.csv"), index=False)
for i in range(num_concussion):
df = generate_sensor_data("concussion")
df.to_csv(os.path.join(output_dir, f"concussion_{i+1:02d}.csv"), index=False)
print(f"Synthetic data with realistic abnormalities saved in '{output_dir}'")
We will now Fast Fourier Transform the accelerometer data to convert it from the time domain (it is recorded across time, 10 minutes in our example) to the frequency domain (so we can begin our harmonic ratio exploration).
# fast fourier transform
def compute_fft(signal, fs=30.1):
N = len(signal)
freq = fftfreq(N, d=1/fs)[:N // 2]
fft_values = np.abs(fft(signal))[:N // 2]
return fft_values, freq
Next, we will apply a bandpass filter to exclude frequencies outside of those involved with regular walking at a brisk speed (as this is what the clinical treadmill test requires). Based on precedent and findings from other gait-accelerometer publications, the following parameters were selected: lowcut
= 0.5, highcut
= 15, fs
= 30.1, order
= 3.
# bandpass filter
def bandpass_filter(data, lowcut, highcut, fs, order):
nyquist = 0.5 * fs
low, high = lowcut / nyquist, highcut / nyquist
b, a = butter(order, [low, high], btype='band', analog=False)
return filtfilt(b, a, data)
Then the harmonic ratio was calculated as the sum of even amplitudes divided by the sum of the odd amplitudes. Higher harmonic ratios suggest more rhythmic, symmetrical gait.
# harmonic ratio calculation
def harmonic_ratio(signal, fs=30.1):
fft_values, freq = compute_fft(signal, fs)
# fundamental frequency based on peaks
peaks, _ = find_peaks(fft_values, height=np.max(fft_values) * 0.05)
if len(peaks) == 0:
return None, []
step_freq = freq[peaks[0]]
# harmonics integer multiples of fundamental frequencies
harmonics = np.arange(step_freq, max(freq), step_freq)
harmonic_amplitudes = [fft_values[np.argmin(np.abs(freq - h))] for h in harmonics[:10]]
even_harmonics = harmonic_amplitudes[1::2]
odd_harmonics = harmonic_amplitudes[0::2]
HR = np.sum(even_harmonics) / np.sum(odd_harmonics) if np.sum(odd_harmonics) > 0 else None
return HR, harmonic_amplitudes
The data are now processed through the following two functions:
# process harmonic folder
def process_folder_harmonic(folder_path, axes={'X': 'AP', 'Y': 'VT', 'Z': 'ML'}):
results = []
harmonics_data = []
sampling_rate = 30.1
for filename in os.listdir(folder_path):
if filename.endswith('.csv'):
file_path = os.path.join(folder_path, filename)
data = pd.read_csv(file_path)
data = data.dropna(subset=['Timestamp'])
data = data.sort_values(by='Timestamp')
start_time = data['Timestamp'].iloc[0]
end_time = start_time + pd.Timedelta(minutes=10)
data = data[data['Timestamp'] <= end_time]
for axis, axis_type in axes.items():
if axis in data.columns:
filtered_signal = bandpass_filter(data[axis].dropna().values, 0.3, 15, sampling_rate, order=3)
harmonic_ratio_value, harmonic_amplitudes = harmonic_ratio(filtered_signal, sampling_rate)
result[f'Harmonic_Ratio_{axis}'] = harmonic_ratio_value
for i, amplitude in enumerate(harmonic_amplitudes, start=1):
harmonics_data.append({
'Filename': filename,
'subject': subject_name,
'Axis': axis,
'Harmonic': i,
'Amplitude': amplitude
})
results.append(result)
results_df = pd.DataFrame(results)
harmonics_df = pd.DataFrame(harmonics_data)
return results_df, harmonics_df
def prepare_harmonic_ratio_data(harmonics_df_dem):
harmonics_df_dem = harmonics_df_dem.copy()
# keep only the first 10 harmonics
harmonics_df_dem = harmonics_df_dem[harmonics_df_dem["Harmonic"] <= 10]
fundamental_amplitude = (
harmonics_df_dem
.loc[harmonics_df_dem["Harmonic"] == 1]
.set_index(["Axis", "group"])["Amplitude"]
)
harmonics_df_dem = harmonics_df_dem.merge(
fundamental_amplitude.rename("Fundamental_Amplitude"),
on=["Axis", "group"],
how="left"
)
# compute the ratio
harmonics_df_dem["Harmonic_Ratio"] = (
harmonics_df_dem["Amplitude"]
/ (harmonics_df_dem["Fundamental_Amplitude"] + 1e-6)
return harmonics_df_dem
These cells above largely prepare the data for analysis. Here, plots of the synthetic data will be produced to demonstrate how the harmonics, from 1 to 10, differ by group and axis.
First the code:
# plot harmonic ratios
def plot_harmonic_ratios(harmonics_df_dem):
axes = ['X', 'Y', 'Z']
fig, axs = plt.subplots(1, 3, figsize=(18, 5), sharey=True)
for i, axis in enumerate(axes):
subset = harmonics_df_dem[harmonics_df_dem['Axis'] == axis]
if subset.empty:
print(f"Warning: No data available for {axis}-axis.")
continue
avg_subset = subset.groupby(["Harmonic", "group"], as_index=False).mean(numeric_only=True)
std_subset = subset.groupby(["Harmonic", "group"], as_index=False).std(numeric_only=True)
sns.lineplot(ax=axs[i], data=avg_subset, x="Harmonic", y="Harmonic_Ratio", hue="group")
axs[i].set_title(f"Harmonic Ratios for {axis}-Axis")
axs[i].set_xlabel("Harmonic Number")
axs[i].set_ylabel("Harmonic Ratio")
axs[i].grid(True)
plt.tight_layout()
plt.show()
And now the plot:
And we can also produce a plot to visualize the average harmonic ratio (across the first 10 harmonics, often used as a summary gait metric) by axis, by group.
# harmonic ratio summary by axis
def plot_average_harmonic_ratio_per_axis(harmonics_df_dem):
avg_harmonic_ratio = harmonics_df_dem.groupby(["Axis", "group"])[["Harmonic_Ratio"]].mean().reset_index()
plt.figure(figsize=(12, 6))
sns.barplot(data=avg_harmonic_ratio, x="group", y="Harmonic_Ratio", hue="Axis")
plt.title("Average Harmonic Ratio Across First 10 Harmonics by Group and Axis")
plt.xlabel("Group")
plt.ylabel("Average Harmonic Ratio")
plt.legend(title="Axis")
plt.grid(True)
plt.show()
The publication involves statistical testing and testing for interaction effects based on other salient variables. But this example produced with synthetic data is designed to tell a similar story as to what is told in the academic publication: gait is less rhythmic or symmetric (per the harmonic ratios) in those with concussion vs. healthy controls during a standardized treadmill test.
Maybe our wearables do collect a richness of information that can have diagnostic or prognostic utility. While this clinical study was completed using a waist-worn science-grade accelerometer, it could be that wrist-based consumer devices (that collect multi-axis data) can nonetheless provide information on gait harmonics. I could see apps on watches that ask users to walk for 10 minutes at a standardized speed on a set incline, data which could then be processed to understand gait rhythmicity and symmetry. This could open doors for not only device-based clinical assessment, but also running efficiency metrics for those interested in sports performance.
Our steps really are filled with richness of information.
Outcomes
Accelerometers capture rich data about our gait. Calculating the harmonic ratio of these data can provide non-obvious insights into features of our gait, such as its rhythmicity and symmetry. This has implications for sports performance and clinical diagnostics and prognostics.
Credits
My source publication is currently under review.