Skip to content

SalmaSameh02/Signal-Processing

Repository files navigation

Signal-Processing

Signal processing project repo this is a project of signal processing that classifies respiratory sound files into normal and abnormal. the project helps in making automatic diagnostic for the different types of the respiratory diseases as (wheeze, crackle,...,etc.) for events and (CAS, DAS, CAS&DAS) for records.

Dataset:

SPRSound: Open-Source SJTU Paediatric Respiratory Sound Database | IEEE Journals & Magazine | IEEE Xplore.

the records files are in (".wav") format. the labels were in json files, for poor quality json files are empty.

Reading data

The dataset was a set of records, each record contain multiple of events. A record could contain normal and abnormal. After importing the needed libraries and got the file of records and file of labels paths firstly we listed the directories in the label file directory and creating two dataframes one for records which would contain path of each record and label annotation the other was for events and would contain path, start, end , type, time starting to loop on list of directory files of json to read each file and extract the information to put it in the dataframes and set the poor quality records type as poor quality and its time = 0

Preprocessing

Removing outliers

The data events' time varies some may be a second other can range from six to seven seconds so we plotted the boxplot of every disease to determain the outliers. we noticed that the outliers nearly starts from 3.7 seconds so we removed events above 3.7 seconds.

Label encoding

Then we started label encoding for normal class in events and records would be "0" and all diseases would be encoded as "1" while poor quality encoded as "2" saving the new dataframe into a csv file.

Drop poor Quality

The poor quality would have been a problem if we countinued to deal with it in our dataset as the events are empty even specialists couldn't classify those poor quality files so we dropped it. The index was reseted of the dataframe and randomized it. we decided on dealing with events instead of records as the data of events is much larger than that of records as each record have three events.

Data filteration

A bandpass filter was applied to pass a certain frequancy by making two functions the first is to make the bandpass filter with given lowcut, highcut and the other function apply the first function to our data.

Records splitting

The records were spiltted into events each record into three events as mentioned before according to the start and the end of each event that information is in the json files. we merged the step of the filteration with the splitting step so the event resulted is filtered.

Data balancing

The difference between class "0", class "1" were determined and deleted from class "0" as it exceeds the other class to balance the data in each class. then plotted it as a pie chart to make sure the data is balanced.
A class called "MyDataset" had been created that takes the list of audio files paths and class label and transform with default assginment "None" the class generates a spectogram for each (".wav") file and resize it to the input size of model the class returns the spectogram of the event and its label.

Train_Validation_Test splitting

The data is splitted into 67.5% train, 12.5% validation, 20% test. Each is put in a dataframe where certain columns would be a parameter for the class "MyDataset" then using pytorch library (DataLoader) it is generated to be loded to the model. By dropping poor quality records and spliting the records into events and labeling each event using the json file included which contain start, end and label for poor quality json file is empty. After applying bandpassfiter, stft and converting to spectogram and spliting it to train, validation, test datasets and processed datasets for each should be ready to be loaded to the model.

Model

Some hyperparameters were identifiedthat is used in various sites in the code The model used is inception resnetV3 using pytorch to classify autogenerated spectogram. The train accuracy = 72.38% while Validation accuracy 63.83%.

The confusion matrix for validation


The confusion matrix for train


ROC

AUC for validation is 0.8654 and for test is 0.891.
The senstivity, specificity for both validation and test are 0.99, 0.32 respectively. The model can classify normal with percision 0.96 for validation and test while abnormal percision is 0.57 for validation and test.