Sat, Jun 2, 2018

Read in 3 minutes

Today I decide to start participate in kaggle competition(old one) and trying to implement my own pipeline from downloading the datasets to submitting the answer

At first I install kaggle-api so that I can download the datasets more easily. I decide to participate in this competition first since it was the first one being teach in fast.ai class.

After download the datasets I check at the file and found out that there is two folder which is train and test folder. Inside the folder got images of cats and dogs in one folder. It was better to seperate all those image in different folder and use that folder name as label. It took me a while to find a way to seperate them since I don’t know the best way to split and copy all the file to seperate folder.

After playing around I decide to use pathlib and shutil for this task. The folder being divide into train(cats,dogs),valid(cats,dogs) and test(unlabel) Then I start with using image augmentation side on and zoom size 1.1 fit the model with 1 epoch, fit with cylical learning rate and then fit with unfreeze layer using differential learning rate annealing for each layer group.

Even though I use all those technique I still cannot improve my model. The best score I can get is 0.6 which is at rank 129. After research a bit I remember that most of the people who win kaggle competition use technique such as ensemble and decide to try it. At first I try to train resnet50 with dropout 0.4 and got val loss of 0.21. Next model I use densenet121 with dropout 0.4 and got val loss of 0.3 and inception4 with dropout 0.3 and got val loss 0.22.

Since all of the model have been train, I try evaluate it with test set and do ensemble to get the final probability and submit my answer to kaggle. Shockingly my score from 0.6 jump to 0.46 which is at rank 36. After a few epoch to try to make my score better I think I can try to add one more model which is vgg19 with dropout 0.2 and got val loss of 0.27.

After do ensemble with 4 models I got public rank of 0.44 from 0.46 at rank 27. I still trying to improve my score by training to increase the number of epoch training the model but still the score either increase/decrease by a low margin.

After research a bit in fast.ai forum I found out that I can also transfer learn other model which is resnext by download the weight file and add to weight folder. I try to use resnext50 and inception_resnet2 for the new model to use for ensemble. The score increase a bit from 0.44 to 0.42 but to me it not that great.

I came to realization that I got too many ensemble and decide to try do something similar to ablation study. Remove the model one by one and record the score. After I remove it model one by one I found that by removing densenet and vgg model my ensemble improve from 0.44 to 0.39 which is at rank 16 public score.