accelerometer package

Submodules

accelerometer.accClassification module

Module to support machine learning of activity states from acc data

accelerometer.accClassification.activityClassification(epochFile, activityModel='walmsley')

Perform classification of activity states from epoch feature data

Based on a balanced random forest with a Hidden Markov Model containing transitions between predicted activity states and emissions trained using a free-living groundtruth to identify pre-defined classes of behaviour from accelerometer data.

Parameters:
  • epochFile (str) – Input csv file of processed epoch data
  • activityModel (str) – Input tar model file which contains random forest pickle model, HMM priors/transitions/emissions npy files, and npy file of METs for each activity state
Returns:

Pandas dataframe of activity epoch data with one-hot encoded labels

Return type:

pandas.DataFrame

Returns:

Activity state labels

Return type:

list(str)

accelerometer.accClassification.addReferenceLabelsToNewFeatures(featuresFile, referenceLabelsFile, outputFile, featuresTxt='activityModels/features.txt', labelCol='label', participantCol='participant', atomicLabelCol='annotation', metCol='MET')

Append reference annotations to newly extracted feature data

This method helps add existing curated labels (from referenceLabelsFile) to a file with newly extracted features (both pre-sorted by participant and time).

Parameters:
  • featuresFile (str) – Input csv file of new features data, pre-sorted by time
  • referenceLabelsFile (str) – Input csv file of reference labelled data, pre-sorted by time
  • outputFile (str) – Output csv file of new features data with refernce labels
  • featuresTxt (str) – Input txt file listing feature column names
  • labelCol (str) – Input label column
  • participantCol (str) – Input participant column
  • atomicLabelCol (str) – Input ‘atomic’ annotation e.g. ‘walking with dog’ vs. ‘walking’
  • metCol (str) – Input MET column
Returns:

New csv file written to <outputFile>

Return type:

void

Example:
>>> from accelerometer import accClassification
>>> accClassification.addReferenceLabelsToNewFeatures("newFeats.csv",
        "refLabels.csv", "newFeatsPlusLabels.csv")
<file written to newFeatsPlusLabels.csv>
accelerometer.accClassification.downloadModel(model)
accelerometer.accClassification.getFileFromTar(tarArchive, targetFile)

Read file from tar

This is currently more tricky than it should be see https://github.com/numpy/numpy/issues/7989

Parameters:
  • tarArchive (str) – Input tarfile object
  • targetFile (str) – Target individual file within .tar
Returns:

file object byte stream

Return type:

object

accelerometer.accClassification.getListFromTxtFile(inputFile)

Read list of items from txt file and return as list

Parameters:inputFile (str) – Input file listing items
Returns:list of items
Return type:list
accelerometer.accClassification.perParticipantSummaryHTML(dfParam, yTrueCol, yPredCol, pidCol, outHTML)

Provide HTML summary of how well activity classification model works at the per-participant level

Parameters:
  • dfParam (dataframe) – Input pandas dataframe
  • yTrueCol (str) – Input for y_true column label
  • yPregCol (str) – Input for y_pred column label
  • pidCol (str) – Input for participant ID column label
  • outHTML (str) – Output file to print HTML summary to
Returns:

HTML file reporting kappa, accuracy, and confusion matrix

Return type:

void

accelerometer.accClassification.resolveModelPath(pathOrModelName)
accelerometer.accClassification.saveModelsToTar(tarArchive, featureCols, rfModel, priors, transitions, emissions, METs, featuresTxt='featureCols.txt', rfModelFile='rfModel.pkl', hmmPriors='hmmPriors.npy', hmmEmissions='hmmEmissions.npy', hmmTransitions='hmmTransitions.npy', hmmMETs='METs.npy')

Save random forest and hidden markov models to tarArchive file

Note we must use the same version of python and scikit learn as in the intended deployment environment

Parameters:
  • tarArchive (str) – Output tarfile
  • featureCols (list) – Input list of feature columns
  • rfModel (sklearn.RandomForestClassifier) – Input random forest model
  • priors (numpy.array) – Input prior probabilities for each activity state
  • transitions (numpy.array) – Input probability matrix of transitioning from one activity state to another
  • emissions (numpy.array) – Input probability matrix of RF prediction being true
  • METs (numpy.array) – Input array of average METs per activity state
  • featuresTxt (str) – Intermediate output txt file of features
  • rfModelFile (str) – Intermediate output random forest pickle model
  • hmmPriors (str) – Intermediate output HMM priors npy
  • hmmEmissions (str) – Intermediate output HMM emissions npy
  • hmmTransitions (str) – Intermediate output HMM transitions npy
  • hmmMETs (str) – Intermediate output HMM METs npy
Returns:

tar file of RF + HMM written to tarArchive

Return type:

void

accelerometer.accClassification.trainClassificationModel(trainingFile, labelCol='label', participantCol='participant', atomicLabelCol='annotation', metCol='MET', featuresTxt='activityModels/features.txt', trainParticipants=None, testParticipants=None, rfThreads=1, rfTrees=1000, rfFeats=None, rfDepth=None, outputPredict='activityModels/test-predictions.csv', outputModel=None)

Train model to classify activity states from epoch feature data

Based on a balanced random forest with a Hidden Markov Model containing transitions between predicted activity states and emissions trained using the input training file to identify pre-defined classes of behaviour from accelerometer data.

Parameters:
  • trainingFile (str) – Input csv file of training data, pre-sorted by time
  • labelCol (str) – Input label column
  • participantCol (str) – Input participant column
  • atomicLabelCol (str) – Input ‘atomic’ annotation e.g. ‘walking with dog’ vs. ‘walking’
  • metCol (str) – Input MET column
  • featuresTxt (str) – Input txt file listing feature column names
  • trainParticipants (str) – Input comma separated list of participant IDs to train on.
  • testParticipants (str) – Input comma separated list of participant IDs to test on.
  • rfThreads (int) – Input num threads to use when training random forest
  • rfTrees (int) – Input num decision trees to include in random forest
  • outputPredict (str) – Output CSV of person, label, predicted
  • outputModel (str) – Output tarfile object which contains random forest pickle model, HMM priors/transitions/emissions npy files, and npy file of METs for each activity state. Will only output trained model if this is not null e.g. “activityModels/sample-model.tar”
Returns:

New model written to <outputModel> OR csv of test predictions written to <outputPredict>

Return type:

void

accelerometer.accClassification.train_HMM(rfModel, y_trainF, labelCol)

Train Hidden Markov Model

Use data not considered in construction of random forest to estimate probabilities of: i) starting in a given state; ii) transitioning from one state to another; and iii) probabilitiy of the random forest being correct when predicting a given class (emission probability)

Parameters:
  • rfModel (sklearn.RandomForestClassifier) – Input random forest object
  • y_trainF (dataframe.Column) – Input groundtruth for each intance
  • labelCol (str) – Input label column
Returns:

states - List of unique activity state labels

rtype: numpy.array

Returns:priors - Prior probabilities for each activity state

rtype: numpy.array

Returns:transitions - Probability matrix of transitioning from one activity state to another

rtype: numpy.array

Returns:emissions - Probability matrix of RF prediction being true

rtype: numpy.array

accelerometer.accClassification.viterbi(observations, states, priors, transitions, emissions, probabilistic=False)

Perform HMM smoothing over observations via Viteri algorithm

Parameters:
  • observations (list(str)) – List/sequence of activity states
  • states (numpy.array) – List of unique activity state labels
  • priors (numpy.array) – Prior probabilities for each activity state
  • transitions (numpy.array) – Probability matrix of transitioning from one activity state to another
  • emissions (numpy.array) – Probability matrix of RF prediction being true
  • probabilistic (bool) – Write probabilistic output for each state, rather than writing most likely state for any given prediction.
Returns:

Smoothed list/sequence of activity states

Return type:

list(str)

accelerometer.accClassification.wristListToTxtFile(inputList, outputFile)

Write list of items to txt file

Parameters:
  • inputList (list) – input list
  • outputFile (str) – Output txt file
Returns:

list of feature columns

Return type:

void

accelerometer.accUtils module

Module to provide generic utilities for other accelerometer modules.

accelerometer.accUtils.collateJSONfilesToSingleCSV(inputJsonDir, outputCsvFile)

read all summary *.json files and convert into one large CSV file

Each json file represents summary data for one participant. Therefore output CSV file contains summary for all participants.

Parameters:
  • inputJsonDir (str) – Directory containing JSON files
  • outputCsvFile (str) – Output CSV filename
Returns:

New file written to <outputCsvFile>

Return type:

void

Example:
>>> import accUtils
>>> accUtils.collateJSONfilesToSingleCSV("data/", "data/summary-all-files.csv")
<summary CSV of all participants/files written to "data/sumamry-all-files.csv">
accelerometer.accUtils.createDirIfNotExists(folder)

Create directory if it doesn’t currently exist

Parameters:folder (str) – Directory to be checked/created
Returns:Dir now exists (created if didn’t exist before, otherwise untouched)
Return type:void
Example:
>>> import accUtils
>>> accUtils.createDirIfNotExists("/myStudy/summary/dec18/")
    <folder "/myStudy/summary/dec18/" now exists>
accelerometer.accUtils.date_parser(t)

Parse date a date string of the form e.g. 2020-06-14 19:01:15.123+0100 [Europe/London]

accelerometer.accUtils.date_strftime(t)

Convert to time format of the form e.g. 2020-06-14 19:01:15.123+0100 [Europe/London]

accelerometer.accUtils.formatNum(num, decimalPlaces)

return str of number formatted to number of decimalPlaces

When writing out 10,000’s of files, it is useful to format the output to n decimal places as a space saving measure.

Parameters:
  • num (float) – Float number to be formatted.
  • decimalPlaces (int) – Number of decimal places for output format
Returns:

Number formatted to number of decimalPlaces

Return type:

str

Example:
>>> import accUtils
>>> accUtils.formatNum(2.567, 2)
2.57
accelerometer.accUtils.identifyUnprocessedFiles(filesCsv, summaryCsv, outputFilesCsv)

identify files that have not been processed

Look through all processed accelerometer files, and find participants who do not have records in the summary csv file. This indicates there was a problem in processing their data. Therefore, output will be a new .csv file to support reprocessing of these files

Parameters:
  • filesCsv (str) – CSV listing acc files in study directory
  • summaryCsv (str) – Summary CSV of processed dataset
  • outputFilesCsv (str) – Output csv listing files to be reprocessed
Returns:

New file written to <outputCsvFile>

Return type:

void

Example:
>>> import accUtils
>>> accUtils.identifyUnprocessedFiles("study/files.csv", study/summary-all-files.csv",
    "study/files-reprocess.csv")
<Output csv listing files to be reprocessed written to "study/files-reprocess.csv">
accelerometer.accUtils.meanCIstr(mean, std, n, numDecimalPlaces)

return str of mean and 95% confidence interval numbers formatted

Parameters:
  • mean (float) – Mean number to be formatted.
  • std (float) – Standard deviation number to be formatted.
  • n (int) – Number of observations
  • decimalPlaces (int) – Number of decimal places for output format
Returns:

String formatted to number of decimalPlaces

Return type:

str

Example:
>>> import accUtils
>>> accUtils.meanSDstr(2.567, 0.089, 2)
2.57 (0.09)
accelerometer.accUtils.meanSDstr(mean, std, numDecimalPlaces)

return str of mean and stdev numbers formatted to number of decimalPlaces

Parameters:
  • mean (float) – Mean number to be formatted.
  • std (float) – Standard deviation number to be formatted.
  • decimalPlaces (int) – Number of decimal places for output format
Returns:

String formatted to number of decimalPlaces

Return type:

str

Example:
>>> import accUtils
>>> accUtils.meanSDstr(2.567, 0.089, 2)
2.57 (0.09)
accelerometer.accUtils.toScreen(msg)

Print msg str prepended with current time

Parameters:mgs (str) – Message to be printed to screen
Returns:Print msg str prepended with current time
Return type:void
Example:
>>> import accUtils
>>> accUtils.toScreen("hello")
2018-11-28 10:53:18    hello
accelerometer.accUtils.updateCalibrationCoefs(inputCsvFile, outputCsvFile)

read summary .csv file and update coefs for those with poor calibration

Look through all processed accelerometer files, and find participants that did not have good calibration data. Then assigns the calibration coefs from previous good use of a given device. Output will be a new .csv file to support reprocessing of uncalibrated files with new pre-specified calibration coefs.

Parameters:
  • inputCsvFile (str) – Summary CSV of processed dataset
  • outputCsvFile (str) – Output CSV of files to be reprocessed with new calibration info
Returns:

New file written to <outputCsvFile>

Return type:

void

Example:
>>> import accUtils
>>> accUtils.updateCalibrationCoefs("data/summary-all-files.csv", "study/files-recalibration.csv")
<CSV of files to be reprocessed written to "study/files-recalibration.csv">
accelerometer.accUtils.writeFilesWithCalibrationCoefs(inputCsvFile, outputCsvFile)

read summary .csv file and write files.csv with calibration coefs

Look through all processed accelerometer files, and write a new .csv file to support reprocessing of files with pre-specified calibration coefs.

Parameters:
  • inputCsvFile (str) – Summary CSV of processed dataset
  • outputCsvFile (str) – Output CSV of files to process with calibration info
Returns:

New file written to <outputCsvFile>

Return type:

void

Example:
>>> import accUtils
>>> accUtils.writeFilesWithCalibrationCoefs("data/summary-all-files.csv",
>>>     "study/files-calibrated.csv")
<CSV of files to be reprocessed written to "study/files-calibrated.csv">
accelerometer.accUtils.writeStudyAccProcessCmds(accDir, outDir, cmdsFile='processCmds.txt', accExt='cwa', cmdOptions=None, filesCSV='files.csv')

Read files to process and write out list of processing commands

This creates the following output directory structure containing all processing results: <outDir>/

summary/ #to store outputSummary.json epoch/ #to store feature output for 30sec windows timeSeries/ #simple csv time series output (VMag, activity binary predictions) nonWear/ #bouts of nonwear episodes stationary/ #temp store for features of stationary data for calibration clusterLogs/ #to store terminal output for each processed file

If a filesCSV exists in accDir/, process the files listed there. If not, all files in accDir/ are processed

Then an acc processing command is written for each file and written to cmdsFile

Parameters:
  • accDirs (str) – Directory(s) with accelerometer files to process
  • outDir (str) – Output directory to be created containing the processing results
  • cmdsFile (str) – Output .txt file listing all processing commands
  • accExt (str) – Acc file type e.g. cwa, CWA, bin, BIN, gt3x…
  • cmdOptions (str) – String of processing options e.g. “–epochPeriod 10” Type ‘python3 accProccess.py -h’ for full list of options
  • filesCSV (str) – Name of .csv file listing acc files to process
Returns:

New file written to <cmdsFile>

Return type:

void

Example:
>>> import accUtils
>>> accUtils.writeStudyAccProcessingCmds("myAccDir/", "myResults/", "myProcessCmds.txt")
<cmd options written to "myProcessCmds.txt">
accelerometer.accUtils.writeTimeSeries(e, labels, tsFile)

Write activity timeseries file :param pandas.DataFrame e: Pandas dataframe of epoch data. Must contain

activity classification columns with missing rows imputed.
Parameters:
  • labels (list(str)) – Activity state labels
  • tsFile (dict) – output CSV filename
Returns:

None

Return type:

void

accelerometer.circadianRhythms module

Module to support calculation of metrics of circadian rhythm from acc data

accelerometer.circadianRhythms.calculateFourierFreq(e, epochPeriod, fourierWithAcc, labels, summary)

Calculate the most prevalent frequency in a fourier analysis

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • labels (list(str)) – Activity state labels
  • summary (dict) – Output dictionary containing all summary metrics
Paran bool fourierWithAcc:
 

True calculates fourier done with acceleration data instead of sleep data

Returns:

Write dict <summary> keys ‘fourier frequency-<1/days>’

accelerometer.circadianRhythms.calculateM10L5(e, epochPeriod, summary)

Calculates the M10 L5 relative amplitude from the average acceleration from the ten most active hours and 5 least most active hours

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • summary (dict) – Output dictionary containing all summary metrics
Returns:

Write dict <summary> keys ‘M10 L5-<rel amp>’

accelerometer.circadianRhythms.calculatePSD(e, epochPeriod, fourierWithAcc, labels, summary)

Calculate the power spectral density from fourier analysis of a 1 day frequency

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • epochPeriod (int) – Size of epoch time window (in seconds)

:param bool fourierWithAcc:True calculates fourier done with acceleration data instead of sleep data :param list(str) labels: Activity state labels :param dict summary: Output dictionary containing all summary metrics

Returns:Write dict <summary> keys ‘PSD-<W/Hz>’

accelerometer.device module

Module to process raw accelerometer files into epoch data.

accelerometer.device.getAxivityDeviceId(cwaFile)

Get serial number of Axivity device

Parses the unique serial code from the header of an Axivity accelerometer file

Parameters:cwaFile (str) – Input raw .cwa accelerometer file
Returns:Device ID
Return type:int
accelerometer.device.getCalibrationCoefs(staticBoutsFile, summary)

Identify calibration coefficients from java processed file

Get axes offset/gain/temp calibration coefficients through linear regression of stationary episodes

Parameters:
  • stationaryFile (str) – Output/temporary file for calibration
  • summary (dict) – Output dictionary containing all summary metrics
Returns:

Calibration summary values written to dict <summary>

Return type:

void

accelerometer.device.getDeviceId(inputFile)

Get serial number of device

First decides which DeviceId parsing method to use for <inputFile>.

Parameters:inputFile (str) – Input raw accelerometer file
Returns:Device ID
Return type:int
accelerometer.device.getGT3XDeviceId(gt3xFile)

Get serial number of Actigraph device

Parses the unique serial code from the header of a GT3X accelerometer file

Parameters:gt3xFile (str) – Input raw .gt3x accelerometer file
Returns:Device ID
Return type:int
accelerometer.device.getGeneaDeviceId(binFile)

Get serial number of GENEActiv device

Parses the unique serial code from the header of a GENEActiv accelerometer file

Parameters:binFile (str) – Input raw .bin accelerometer file
Returns:Device ID
Return type:int
accelerometer.device.getOmconvertInfo(omconvertInfoFile, summary)

Identify calibration coefficients for omconvert processed file

Get axes offset/gain/temp calibration coeffs from omconvert info file

Parameters:
  • omconvertInfoFile (str) – Output information file from omconvert
  • summary (dict) – Output dictionary containing all summary metrics
Returns:

Calibration summary values written to dict <summary>

Return type:

void

accelerometer.device.processInputFileToEpoch(inputFile, timeZone, timeShift, epochFile, stationaryFile, summary, skipCalibration=False, stationaryStd=13, xyzIntercept=[0.0, 0.0, 0.0], xyzSlope=[1.0, 1.0, 1.0], xyzTemp=[0.0, 0.0, 0.0], meanTemp=20.0, rawDataParser='AccelerometerParser', javaHeapSpace=None, useFilter=True, sampleRate=100, resampleMethod='linear', epochPeriod=30, activityClassification=True, rawOutput=False, rawFile=None, npyOutput=False, npyFile=None, startTime=None, endTime=None, verbose=False, csvStartTime=None, csvSampleRate=None, csvTimeFormat="yyyy-MM-dd HH:mm:ss.SSSxxxx '['VV']'", csvStartRow=1, csvTimeXYZTempColsIndex=None)

Process raw accelerometer file, writing summary epoch stats to file

This is usually achieved by
  1. identify 10sec stationary epochs
  2. record calibrated axes scale/offset/temp vals + static point stats

3) use calibration coefficients and then write filtered avgVm epochs to <epochFile> from <inputFile>

Parameters:
  • inputFile (str) – Input <cwa/cwa.gz/bin/gt3x> raw accelerometer file
  • epochFile (str) – Output csv.gz file of processed epoch data
  • stationaryFile (str) – Output/temporary file for calibration
  • summary (dict) – Output dictionary containing all summary metrics
  • skipCalibration (bool) – Perform software calibration (process data twice)
  • stationaryStd (int) – Gravity threshold (in mg units) for stationary vs not
  • xyzIntercept (list(float)) – Calbiration offset [x, y, z]
  • xyzSlope (list(float)) – Calbiration slope [x, y, z]
  • xyzTemp (list(float)) – Calbiration temperature coefficient [x, y, z]
  • meanTemp (float) – Calibration mean temperature in file
  • rawDataParser (str) – External helper process to read raw acc file. If a java class, it must omit .class ending.
  • javaHeapSpace (str) – Amount of heap space allocated to java subprocesses. Useful for limiting RAM usage.
  • useFilter (bool) – Filter ENMOtrunc signal
  • sampleRate (int) – Resample data to n Hz
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • activityClassification (bool) – Extract features for machine learning
  • rawOutput (bool) – Output calibrated and resampled raw data to a .csv.gz file? requires ~50MB/day.
  • rawFile (str) – Output raw data “.csv.gz” filename
  • npyOutput (bool) – Output calibrated and resampled raw data to a .npy file? requires ~60MB/day.
  • npyFile (str) – Output raw data “.npy” filename
  • startTime (datetime) – Remove data before this time in analysis
  • endTime (datetime) – Remove data after this time in analysis
  • verbose (bool) – Print verbose output
  • csvStartTime (datetime) – start time for csv file when time column is not available
  • csvSampleRate (float) – sample rate for csv file when time column is not available
  • csvTimeFormat (str) – time format for csv file when time column is available
  • csvStartRow (int) – start row for accelerometer data in csv file
  • csvTimeXYZTempColsIndex (str) – index of column positions for XYZT columns, e.g. “1,2,3,0”
Returns:

Raw processing summary values written to dict <summary>

Return type:

void

Example:
>>> import device
>>> summary = {}
>>> device.processInputFileToEpoch('inputFile.cwa', 'epochFile.csv.gz',
        'stationary.csv.gz', summary)
<epoch file written to "epochFile.csv.gz", and calibration points to
    'stationary.csv.gz'>
accelerometer.device.storeCalibrationInformation(summary, bestIntercept, bestSlope, bestTemp, meanTemp, initError, bestError, xMin, xMax, yMin, yMax, zMin, zMax, nStatic, calibrationSphereCriteria=0.3)

Store calibration information to output summary dictionary

Parameters:
  • summary (dict) – Output dictionary containing all summary metrics
  • bestIntercept (list(float)) – Best x/y/z intercept values
  • bestSlope (list(float)) – Best x/y/z slope values
  • bestTemperature (list(float)) – Best x/y/z temperature values
  • meanTemp (float) – Calibration mean temperature in file
  • initError (float) – Root mean square error (in mg) before calibration
  • initError – Root mean square error (in mg) after calibration
  • xMin (float) – xMin information on spread of stationary points
  • xMax (float) – xMax information on spread of stationary points
  • yMin (float) – yMin information on spread of stationary points
  • yMax (float) – yMax information on spread of stationary points
  • zMin (float) – zMin information on spread of stationary points
  • zMax (float) – zMax information on spread of stationary points
  • nStatic (int) – number of stationary points used for calibration
  • calibrationSphereCriteria (float) – Threshold to check how well file was calibrated
Returns:

Calibration summary values written to dict <summary>

Return type:

void

accelerometer.device.storeCalibrationParams(summary, xyzOff, xyzSlope, xyzTemp, meanTemp)

Store calibration parameters to output summary dictionary

Parameters:
  • summary (dict) – Output dictionary containing all summary metrics
  • xyzOff (list(float)) – intercept [x, y, z]
  • xyzSlope (list(float)) – slope [x, y, z]
  • xyzTemp (list(float)) – temperature [x, y, z]
  • meanTemp (float) – Calibration mean temperature in file
Returns:

Calibration summary values written to dict <summary>

Return type:

void

accelerometer.summariseEpoch module

Module to generate overall activity summary from epoch data.

accelerometer.summariseEpoch.calculateECDF(x, summary)

Calculate activity intensity empirical cumulative distribution

The input data must not be imputed, as ECDF requires different imputation where nan/non-wear data segments are IMPUTED FOR EACH INTENSITY LEVEL. Here, the average of similar time-of-day values is imputed with one minute granularity on different days of the measurement. Following intensity levels are calculated: 1mg bins from 1-20mg 5mg bins from 25-100mg 25mg bins from 125-500mg 100mg bins from 500-2000mg

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • inputCol (str) – Column to calculate intensity distribution on
  • summary (dict) – Output dictionary containing all summary metrics
Returns:

Write dict <summary> keys ‘<inputCol>-ecdf-<level…>mg’

Return type:

void

accelerometer.summariseEpoch.checkQuality(e, summary)
accelerometer.summariseEpoch.getActivitySummary(epochFile, nonWearFile, summary, activityClassification=True, timeZone='Europe/London', startTime=None, endTime=None, epochPeriod=30, stationaryStd=13, minNonWearDuration=60, mgCutPointMVPA=100, mgCutPointVPA=425, activityModel='walmsley', intensityDistribution=False, imputation=True, psd=False, fourierFrequency=False, fourierWithAcc=False, m10l5=False)

Calculate overall activity summary from <epochFile> data

Get overall activity summary from input <epochFile>. This is achieved by 1) get interrupt and data error summary vals 2) check if data occurs at a daylight savings crossover 3) calculate wear-time statistics, and write nonWear episodes to file 4) predict activity from features, and add label column 5) calculate imputation values to replace nan PA metric values 6) calculate empirical cumulative distribution function of vector magnitudes 7) derive main movement summaries (overall, weekday/weekend, and hour)

Parameters:
  • epochFile (str) – Input csv.gz file of processed epoch data
  • nonWearFile (str) – Output filename for non wear .csv.gz episodes
  • summary (dict) – Output dictionary containing all summary metrics
  • activityClassification (bool) – Perform machine learning of activity states
  • timeZone (str) – timezone in country/city format to be used for daylight savings crossover check
  • startTime (datetime) – Remove data before this time in analysis
  • endTime (datetime) – Remove data after this time in analysis
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • stationaryStd (int) – Threshold (in mg units) for stationary vs not
  • minNonWearDuration (int) – Minimum duration of nonwear events (minutes)
  • mgCutPointMVPA (int) – Milli-gravity threshold for moderate intensity activity
  • mgCutPointVPA (int) – Milli-gravity threshold for vigorous intensity activity
  • activityModel (str) – Input tar model file which contains random forest pickle model, HMM priors/transitions/emissions npy files, and npy file of METS for each activity state
  • intensityDistribution (bool) – Add intensity outputs to dict <summary>
  • imputation (bool) – Impute missing data using data from other days around the same time
  • verbose (bool) – Print verbose output
Returns:

Pandas dataframe of activity epoch data

Return type:

pandas.DataFrame

Returns:

Activity prediction labels (empty if <activityClassification>==False)

Return type:

list(str)

Returns:

Write .csv.gz non wear episodes file to <nonWearFile>

Return type:

void

Returns:

Movement summary values written to dict <summary>

Return type:

void

Example:
>>> import summariseEpoch
>>> summary = {}
>>> epochData, labels = summariseEpoch.getActivitySummary( "epoch.csv.gz",
        "nonWear.csv.gz", summary)
<nonWear file written to "nonWear.csv.gz" and dict "summary" update with outcomes>
accelerometer.summariseEpoch.imputeMissing(e)

Impute missing/nonwear segments

Impute non-wear data segments using the average of similar time-of-day values with one minute granularity on different days of the measurement. This imputation accounts for potential wear time diurnal bias where, for example, if the device was systematically less worn during sleep in an individual, the crude average vector magnitude during wear time would be a biased overestimate of the true average. See https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0169649#sec013

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • verbose (bool) – Print verbose output
Returns:

Update DataFrame <e> columns nan values with time-of-day imputation

Return type:

void

accelerometer.summariseEpoch.resolveInterrupts(e, epochPeriod, summary)

Fix any read interrupts by resampling and filling with NaNs

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • summary (dict) – Dictionary containing summary metrics
Returns:

Write dict <summary> keys ‘err-interrupts-num’ & ‘errs-interrupt-mins’

Return type:

void

accelerometer.summariseEpoch.resolveNonWear(e, epochPeriod, maxStd, minDuration, nonWearFile, summary)

Calculate nonWear time, write episodes to file, and return wear statistics

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • epochPeriod (int) – Size of epoch time window (in seconds)
  • maxStd (int) – Threshold (in mg units) for stationary vs not
  • minDuration (int) – Minimum duration of nonwear events (minutes)
  • nonWearFile (str) – Output filename for non wear .csv.gz episodes
  • summary (dict) – Output dictionary containing all summary metrics
Returns:

Write dict <summary> keys ‘wearTime-numNonWearEpisodes(>1hr)’, ‘wearTime-overall(days)’, ‘nonWearTime-overall(days)’, ‘wearTime-diurnalHrs’, ‘wearTime-diurnalMins’, ‘quality-goodWearTime’, ‘wearTime-<day…>’, and ‘wearTime-hourOfDay-<hr…>’

Return type:

void

Returns:

Write .csv.gz non wear episodes file to <nonWearFile>

Return type:

void

accelerometer.summariseEpoch.writeMovementSummaries(e, labels, summary)

Write overall summary stats for each activity type to summary dict

Parameters:
  • e (pandas.DataFrame) – Pandas dataframe of epoch data
  • labels (list(str)) – Activity state labels
  • summary (dict) – Output dictionary containing all summary metrics
  • imputation (bool) – Impute missing data using data from other days around the same time
Returns:

Write dict <summary> keys for each activity type ‘overall-<avg/sd>’, ‘week<day/end>-avg’, ‘<day..>-avg’, ‘hourOfDay-<hr..>-avg’, ‘hourOfWeek<day/end>-<hr..>-avg’

Return type:

void

Module contents