save time through shell script unzipping multiple zip files
ubuntu shell bash utility javascript browser-addon newbie
15 Dec 2016I got carried away and downloaded lots of pdf files in zip files, so I needed a way to extract each file into one directory. I had to brush up on shell scripting to get the job done. Extracting a single file was becoming to tedious. Most of these zip files had either pdf, epub or mobi files. I prioritiezed pdf files. My goal was to get the pdf files if there wasn’t one, I would take the epub and if that wasn’t included I would extract the whole zip archive into a specific dir. (out)
First off I checked how many zip files I was dealing with about 148 zip archives
ls *.zip | wc -l
Guided by the assumption that if there was only 1 file it had to be a pdf file I hacked a script to fetch file count in a zipfile using zipinfo, then grep the number. If it had more than one file I could extract only the pdf file or else extract the single files. Started testing commands see if I could successfuly get the file count correctly.
ls *.zip | xargs zipinfo | grep -Eo '[0-9]{1,4} file'
That didn’t work, zipinfo was expecting the arguments to be piped one at a time, so I added -l flag to xargs
testing it out shows that It has problems with duplicate files names (1) so I wrote another line to get rid of duplicates
find -iname "*[0-9]*" -exec rm {} +
Things are moving on smoothly and I moved the snippet to float.sh
Finally run the code ` ./float.sh ` Checked if the files I extracted were equal to the number of files ` cd out && ls | wc -l `. Surprise, surprise I had 145 files. Something wasn’t right wrote another script to use the zip file names to check if the files had already been extracted. That’s how I noticed that some zip files had single file but it was either misspelled e.g .Epub instead of .epub or it was a single .mobi file so It wasn’t getting extracted
checking if files had already been extracted
Ended up switching the logic to check if pdf, epub files exist in the zip and extracting them to the correct folder
improved logic
Last but not least rename the files, remove the (www.ebook-dl.com)
rename -v 's/\(.*\)//' ./*
The biggest take away from this is that you should test your scripts with a few files and always carry out sanity checks. Test each command because one bad command will ruin the whole pipe sequence.
links:
- fixing xargs error
- use zipinfo to read filecount
- fetching numbers in grep
- remove files after calling find
- shell scripting basics
- bash arithmetic
- bash cheatsheet
- unzip to particular directory
- unzip specific extensions only
- finding strings withing string
- cutting strings in bash
- mass rename files
- case insensitive grep