GestureRecognitionToolkit
Version: 0.1.0
The Gesture Recognition Toolkit (GRT) is a cross-platform, open-source, c++ machine learning library for real-time gesture recognition.
|
#include <TimeSeriesClassificationData.h>
Public Member Functions | |
TimeSeriesClassificationData (UINT numDimensions=0, std::string datasetName="NOT_SET", std::string infoText="") | |
TimeSeriesClassificationData (const TimeSeriesClassificationData &rhs) | |
virtual | ~TimeSeriesClassificationData () |
TimeSeriesClassificationData & | operator= (const TimeSeriesClassificationData &rhs) |
TimeSeriesClassificationSample & | operator[] (const UINT &i) |
const TimeSeriesClassificationSample & | operator[] (const UINT &i) const |
void | clear () |
bool | setNumDimensions (const UINT numDimensions) |
bool | setDatasetName (const std::string datasetName) |
bool | setInfoText (const std::string infoText) |
bool | setClassNameForCorrespondingClassLabel (const std::string className, const UINT classLabel) |
bool | setAllowNullGestureClass (const bool allowNullGestureClass) |
bool | addSample (const UINT classLabel, const MatrixFloat &trainingSample) |
bool | removeLastSample () |
UINT | eraseAllSamplesWithClassLabel (const UINT classLabel) |
bool | relabelAllSamplesWithClassLabel (const UINT oldClassLabel, const UINT newClassLabel) |
bool | setExternalRanges (const Vector< MinMax > &externalRanges, const bool useExternalRanges=false) |
bool | enableExternalRangeScaling (const bool useExternalRanges) |
bool | scale (const Float minTarget, const Float maxTarget) |
bool | scale (const Vector< MinMax > &ranges, const Float minTarget, const Float maxTarget) |
bool | save (const std::string &filename) const |
bool | load (const std::string &filename) |
bool | saveDatasetToFile (const std::string filename) const |
bool | loadDatasetFromFile (const std::string filename) |
bool | saveDatasetToCSVFile (const std::string &filename) const |
bool | loadDatasetFromCSVFile (const std::string &filename) |
bool | printStats () const |
std::string | getStatsAsString () const |
TimeSeriesClassificationData | partition (const UINT partitionPercentage, const bool useStratifiedSampling=false) |
bool | merge (const TimeSeriesClassificationData &labelledData) |
bool | spiltDataIntoKFolds (const UINT K, const bool useStratifiedSampling=false) |
TimeSeriesClassificationData | getTrainingFoldData (const UINT foldIndex) const |
TimeSeriesClassificationData | getTestFoldData (const UINT foldIndex) const |
TimeSeriesClassificationData | getClassData (const UINT classLabel) const |
UnlabelledData | reformatAsUnlabelledData () const |
std::string | getDatasetName () const |
std::string | getInfoText () const |
UINT | getNumDimensions () const |
UINT | getNumSamples () const |
UINT | getNumClasses () const |
UINT | getMinimumClassLabel () const |
UINT | getMaximumClassLabel () const |
UINT | getClassLabelIndexValue (const UINT classLabel) const |
std::string | getClassNameForCorrespondingClassLabel (const UINT classLabel) const |
Vector< MinMax > | getRanges () const |
Vector< ClassTracker > | getClassTracker () const |
Vector< TimeSeriesClassificationSample > | getClassificationData () const |
MatrixFloat | getDataAsMatrixFloat () const |
Protected Attributes | |
std::string | datasetName |
The name of the dataset. | |
std::string | infoText |
Some infoText about the dataset. | |
UINT | numDimensions |
The number of dimensions in the dataset. | |
UINT | totalNumSamples |
The total number of samples in the dataset. | |
UINT | kFoldValue |
The number of folds the dataset has been spilt into for cross valiation. | |
bool | crossValidationSetup |
A flag to show if the dataset is ready for cross validation. | |
bool | useExternalRanges |
A flag to show if the dataset should be scaled using the externalRanges values. | |
bool | allowNullGestureClass |
A flag that enables/disables a user from adding new samples with a class label matching the default null gesture label. | |
Vector< MinMax > | externalRanges |
A vector containing a set of externalRanges set by the user. | |
Vector< ClassTracker > | classTracker |
A vector of ClassTracker, which keeps track of the number of samples of each class. | |
Vector< TimeSeriesClassificationSample > | data |
The labelled time series classification data. | |
Vector< Vector< UINT > > | crossValidationIndexs |
A vector to hold the indexs of the dataset for the cross validation. | |
DebugLog | debugLog |
Default debugging log. | |
ErrorLog | errorLog |
Default error log. | |
WarningLog | warningLog |
Default warning log. | |
GRT MIT License Copyright (c) <2012> <Nicholas Gillian, Media Lab, MIT>
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Definition at line 42 of file TimeSeriesClassificationData.h.
GRT_BEGIN_NAMESPACE TimeSeriesClassificationData::TimeSeriesClassificationData | ( | UINT | numDimensions = 0 , |
std::string | datasetName = "NOT_SET" , |
||
std::string | infoText = "" |
||
) |
Constructor, sets the name of the dataset and the number of dimensions of the training data. The name of the dataset should not contain any spaces.
numDimensions | the number of dimensions of the training data, should be an unsigned integer greater than 0 |
datasetName | the name of the dataset, should not contain any spaces |
infoText | some info about the data in this dataset, this can contain spaces |
Definition at line 25 of file TimeSeriesClassificationData.cpp.
TimeSeriesClassificationData::TimeSeriesClassificationData | ( | const TimeSeriesClassificationData & | rhs | ) |
Copy Constructor, copies the TimeSeriesClassificationData from the rhs instance to this instance
rhs | another instance of the TimeSeriesClassificationData class from which the data will be copied to this instance |
Definition at line 42 of file TimeSeriesClassificationData.cpp.
|
virtual |
Default Destructor
Definition at line 51 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::addSample | ( | const UINT | classLabel, |
const MatrixFloat & | trainingSample | ||
) |
Adds a new labelled timeseries sample to the dataset. The dimensionality of the sample should match the number of dimensions in the dataset. The class label should be greater than zero (as zero is used as the default null rejection class label).
classLabel | the class label of the corresponding sample |
trainingSample | the new sample you want to add to the dataset. The dimensionality of this sample (i.e. Matrix columns) should match the number of dimensions in the dataset, the rows of the Matrix represent time and do not have to be any specific length |
Definition at line 131 of file TimeSeriesClassificationData.cpp.
void TimeSeriesClassificationData::clear | ( | ) |
Clears any previous training data and counters
Definition at line 73 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::enableExternalRangeScaling | ( | const bool | useExternalRanges | ) |
Sets if the dataset should be scaled using an external range (if useExternalRanges == true) or the ranges of the dataset (if false). The external ranges need to be set FIRST before calling this function, otherwise it will return false.
useExternalRanges | sets if these ranges should be used to scale the dataset |
Definition at line 278 of file TimeSeriesClassificationData.cpp.
UINT TimeSeriesClassificationData::eraseAllSamplesWithClassLabel | ( | const UINT | classLabel | ) |
Deletes from the dataset all the samples with a specific class label.
classLabel | the class label of the samples you wish to delete from the dataset |
Definition at line 168 of file TimeSeriesClassificationData.cpp.
TimeSeriesClassificationData TimeSeriesClassificationData::getClassData | ( | const UINT | classLabel | ) | const |
Returns the all the data with the class label set by classLabel. The classLabel should be a valid classLabel, otherwise the dataset returned will be empty.
const | UINT classLabel: the class label of the class you want the data for |
Definition at line 967 of file TimeSeriesClassificationData.cpp.
|
inline |
Gets the classification data.
Definition at line 445 of file TimeSeriesClassificationData.h.
UINT TimeSeriesClassificationData::getClassLabelIndexValue | ( | const UINT | classLabel | ) | const |
Gets the index of the class label from the class tracker.
Definition at line 1021 of file TimeSeriesClassificationData.cpp.
std::string TimeSeriesClassificationData::getClassNameForCorrespondingClassLabel | ( | const UINT | classLabel | ) | const |
Gets the name of the class with a given class label. If the class label does not exist then the std::string "CLASS_LABEL_NOT_FOUND" will be returned.
Definition at line 1031 of file TimeSeriesClassificationData.cpp.
|
inline |
Gets the class tracker for each class in the dataset.
Definition at line 438 of file TimeSeriesClassificationData.h.
MatrixFloat TimeSeriesClassificationData::getDataAsMatrixFloat | ( | ) | const |
Gets the data as a MatrixFloat. This returns just the data, not the labels. This will be an M by N MatrixFloat, where M is the number of samples and N is the number of dimensions.
Definition at line 1062 of file TimeSeriesClassificationData.cpp.
|
inline |
Gets the name of the dataset.
Definition at line 368 of file TimeSeriesClassificationData.h.
|
inline |
Gets the infotext for the dataset
Definition at line 375 of file TimeSeriesClassificationData.h.
UINT TimeSeriesClassificationData::getMaximumClassLabel | ( | ) | const |
Gets the maximum class label in the dataset. If there are no values in the dataset then the value 0 will be returned.
Definition at line 1009 of file TimeSeriesClassificationData.cpp.
UINT TimeSeriesClassificationData::getMinimumClassLabel | ( | ) | const |
Gets the minimum class label in the dataset. If there are no values in the dataset then the value 99999 will be returned.
Definition at line 996 of file TimeSeriesClassificationData.cpp.
|
inline |
Gets the number of classes.
Definition at line 396 of file TimeSeriesClassificationData.h.
|
inline |
Gets the number of dimensions of the labelled classification data.
Definition at line 382 of file TimeSeriesClassificationData.h.
|
inline |
Gets the number of samples in the classification data across all the classes.
Definition at line 389 of file TimeSeriesClassificationData.h.
Gets the ranges of the classification data.
Definition at line 1041 of file TimeSeriesClassificationData.cpp.
std::string TimeSeriesClassificationData::getStatsAsString | ( | ) | const |
Gets the dataset info (such as its name and infoText) and the stats (such as the number of examples, number of dimensions, number of classes, etc.) as a std::string.
Definition at line 668 of file TimeSeriesClassificationData.cpp.
TimeSeriesClassificationData TimeSeriesClassificationData::getTestFoldData | ( | const UINT | foldIndex | ) | const |
Returns the test dataset for the k-th fold for cross validation. The spiltDataIntoKFolds(UINT K) function should have been called once before using this function. The foldIndex should be in the range [0 K-1], where K is the number of folds the data was spilt into.
const | UINT foldIndex: the index of the fold you want the test data for, this should be in the range [0 K-1], where K is the number of folds the data was spilt into |
Definition at line 947 of file TimeSeriesClassificationData.cpp.
TimeSeriesClassificationData TimeSeriesClassificationData::getTrainingFoldData | ( | const UINT | foldIndex | ) | const |
Returns the training dataset for the k-th fold for cross validation. The spiltDataIntoKFolds(UINT K) function should have been called once before using this function. The foldIndex should be in the range [0 K-1], where K is the number of folds the data was spilt into.
const | UINT foldIndex: the index of the fold you want the training data for, this should be in the range [0 K-1], where K is the number of folds the data was spilt into |
Definition at line 919 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::load | ( | const std::string & | filename | ) |
Load the data from a file. If the file format ends in '.csv' then the function will try and load the data from a csv format. If this fails then it will try and load the data as a custom GRT file.
filename | the name of the file the data will be loaded from |
Definition at line 317 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::loadDatasetFromCSVFile | ( | const std::string & | filename | ) |
Loads the classification data from a CSV file. This assumes the data is formatted with each row representing a sample. The first column should represent the timeseries counter. The class label should be the second column followed by the sample data as the following N columns, where N is the number of dimensions in the data.
filename | the name of the file the data will be loaded from |
Definition at line 584 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::loadDatasetFromFile | ( | const std::string | filename | ) |
Loads the labelled timeseries classification data from a custom file format.
filename | the name of the file the data will be loaded from |
Definition at line 377 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::merge | ( | const TimeSeriesClassificationData & | labelledData | ) |
Adds the data in the labelledData set to the current instance of the TimeSeriesClassificationData. The number of dimensions in both datasets must match. The names of the classes from the labelledData will be added to the current instance.
const | TimeSeriesClassificationData &labelledData: the dataset to add to this dataset |
Definition at line 789 of file TimeSeriesClassificationData.cpp.
TimeSeriesClassificationData & TimeSeriesClassificationData::operator= | ( | const TimeSeriesClassificationData & | rhs | ) |
Sets the equals operator, copies the data from the rhs instance to this instance
rhs | another instance of the TimeSeriesClassificationData class from which the data will be copied to this instance |
Definition at line 53 of file TimeSeriesClassificationData.cpp.
|
inline |
Array Subscript Operator, returns the TimeSeriesClassificationSample at index i. It is up to the user to ensure that i is within the range of [0 totalNumSamples-1]
i | the index of the training sample you want to access. Must be within the range of [0 totalNumSamples-1] |
Definition at line 82 of file TimeSeriesClassificationData.h.
|
inline |
Const Array Subscript Operator, returns the TimeSeriesClassificationSample at index i. It is up to the user to ensure that i is within the range of [0 totalNumSamples-1]
i | the index of the training sample you want to access. Must be within the range of [0 totalNumSamples-1] |
Definition at line 93 of file TimeSeriesClassificationData.h.
TimeSeriesClassificationData TimeSeriesClassificationData::partition | ( | const UINT | partitionPercentage, |
const bool | useStratifiedSampling = false |
||
) |
Partitions the dataset into a training dataset (which is kept by this instance of the TimeSeriesClassificationData) and a testing/validation dataset (which is returned as a new instance of a TimeSeriesClassificationData).
partitionPercentage | sets the percentage of data which remains in this instance, the remaining percentage of data is then returned as the testing/validation dataset |
useStratifiedSampling | sets if the dataset should be broken into homogeneous groups first before randomly being spilt, default value is false |
Definition at line 701 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::printStats | ( | ) | const |
Prints the dataset info (such as its name and infoText) and the stats (such as the number of examples, number of dimensions, number of classes, etc.) to the std out.
Definition at line 661 of file TimeSeriesClassificationData.cpp.
UnlabelledData TimeSeriesClassificationData::reformatAsUnlabelledData | ( | ) | const |
Reformats the TimeSeriesClassificationData as UnlabeledData so the data can be used to train unsupervised training algorithms such as K-Means Clustering and Gaussian Mixture Models.
Definition at line 977 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::relabelAllSamplesWithClassLabel | ( | const UINT | oldClassLabel, |
const UINT | newClassLabel | ||
) |
Relabels all the samples with the class label A with the new class label B.
oldClassLabel | the class label of the samples you want to relabel |
newClassLabel | the class label the samples will be relabelled with |
Definition at line 223 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::removeLastSample | ( | ) |
Removes the last training sample added to the dataset.
Definition at line 197 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::save | ( | const std::string & | filename | ) | const |
Saves the data to a file. If the file format ends in '.csv' then the data will be saved as comma-seperated-values, otherwise it will be saved to a custom GRT file (which contains the csv data with an additional header).
filename | the name of the file the data will be saved to |
Definition at line 306 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::saveDatasetToCSVFile | ( | const std::string & | filename | ) | const |
Saves the data to a CSV file. This will save the timeseries counter as the first column, the class label as the second column, and the sample data as the following N columns, where N is the number of dimensions in the data. Each row will represent a sample.
filename | the name of the file the data will be saved to |
Definition at line 555 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::saveDatasetToFile | ( | const std::string | filename | ) | const |
Saves the labelled timeseries classification data to a custom file format.
filename | the name of the file the data will be saved to |
Definition at line 328 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::scale | ( | const Float | minTarget, |
const Float | maxTarget | ||
) |
Scales the dataset to the new target range.
minTarget | the minimum range you want to scale the data to |
maxTarget | the maximum range you want to scale the data to |
Definition at line 286 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::scale | ( | const Vector< MinMax > & | ranges, |
const Float | minTarget, | ||
const Float | maxTarget | ||
) |
Scales the dataset to the new target range, using the vector of ranges as the min and max source ranges.
ranges | a vector of source ranges, should have the same dimensions as your data |
minTarget | the minimum range you want to scale the data to |
maxTarget | the maximum range you want to scale the data to |
Definition at line 291 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setAllowNullGestureClass | ( | const bool | allowNullGestureClass | ) |
Sets if the user can add samples to the dataset with the label matching the GRT_DEFAULT_NULL_CLASS_LABEL. If the allowNullGestureClass is set to true, then the user can add labels matching the default null class label (which is normally 0). If the allowNullGestureClass is set to false, then the user will not be able to add samples that have a class label matching the default null class label.
allowNullGestureClass | true if you want to use the default null gesture as a label |
Definition at line 126 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setClassNameForCorrespondingClassLabel | ( | const std::string | className, |
const UINT | classLabel | ||
) |
Sets the name of the class with the given class label. There should not be any spaces in the className. Will return true if the name is set, or false if the class label does not exist.
className | the new name for the class |
classLabel | the label ID that you want to set the class name for |
Definition at line 114 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setDatasetName | ( | const std::string | datasetName | ) |
Sets the name of the dataset. There should not be any spaces in the name. Will return true if the name is set, or false otherwise.
datasetName | the new name of the dataset |
Definition at line 97 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setExternalRanges | ( | const Vector< MinMax > & | externalRanges, |
const bool | useExternalRanges = false |
||
) |
Sets the external ranges of the dataset, also sets if the dataset should be scaled using these values. The dimensionality of the externalRanges vector should match the number of dimensions of this dataset.
externalRanges | an N dimensional vector containing the min and max values of the expected ranges of the dataset. |
useExternalRanges | sets if these ranges should be used to scale the dataset, default value is false. |
Definition at line 268 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setInfoText | ( | const std::string | infoText | ) |
Sets the info std::string. This can be any std::string with information about how the training data was recorded for example.
infoText | the infoText |
Definition at line 109 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::setNumDimensions | ( | const UINT | numDimensions | ) |
Sets the number of dimensions in the training data. This should be an unsigned integer greater than zero. This will clear any previous training data and counters. This function needs to be called before any new samples can be added to the dataset, unless the numDimensions variable was set in the constructor or some data was already loaded from a file
numDimensions | the number of dimensions of the training data. Must be an unsigned integer greater than zero |
Definition at line 79 of file TimeSeriesClassificationData.cpp.
bool TimeSeriesClassificationData::spiltDataIntoKFolds | ( | const UINT | K, |
const bool | useStratifiedSampling = false |
||
) |
This function prepares the dataset for k-fold cross validation and should be called prior to calling the getTrainingFold(UINT foldIndex) or getTestingFold(UINT foldIndex) functions. It will spilt the dataset into K-folds, as long as K < M, where M is the number of samples in the dataset.
const | UINT K: the number of folds the dataset will be split into, K should be less than the number of samples in the dataset |
const | bool useStratifiedSampling: sets if the dataset should be broken into homogeneous groups first before randomly being spilt, default value is false |
Definition at line 814 of file TimeSeriesClassificationData.cpp.