save time through shell script unzipping multiple zip files

zipfiles I got carried away and downloaded lots of pdf files in zip files, so I needed a way to extract each file into one directory. I had to brush up on shell scripting to get the job done. Extracting a single file was becoming to tedious. Most of these zip files had either pdf, epub or mobi files. I prioritiezed pdf files. My goal was to get the pdf files if there wasn’t one, I would take the epub and if that wasn’t included I would extract the whole zip archive into a specific dir. (out) file types

First off I checked how many zip files I was dealing with about 148 zip archives

ls *.zip | wc -l 

Guided by the assumption that if there was only 1 file it had to be a pdf file I hacked a script to fetch file count in a zipfile using zipinfo, then grep the number. If it had more than one file I could extract only the pdf file or else extract the single files. Started testing commands see if I could successfuly get the file count correctly.

ls *.zip | xargs zipinfo | grep -Eo '[0-9]{1,4} file'

xargs zipinfo That didn’t work, zipinfo was expecting the arguments to be piped one at a time, so I added -l flag to xargs

duplicates testing it out shows that It has problems with duplicate files names (1) so I wrote another line to get rid of duplicates

find -iname "*[0-9]*" -exec rm {} +

Things are moving on smoothly and I moved the snippet to float.sh

# !/bin/bash
# float.sh
for i in $(ls *.zip)
do
	myvar=$(zipinfo $i | grep -Eo '[0-9] file' | grep -Eo '[0-9]')
	if [ $myvar -gt 1 ]
	then
		unzip $i -d out "*.pdf"
	else
		unzip $i -d out
	fi
done

Finally run the code ` ./float.sh ` Checked if the files I extracted were equal to the number of files ` cd out && ls | wc -l `. Surprise, surprise I had 145 files. Something wasn’t right wrote another script to use the zip file names to check if the files had already been extracted. That’s how I noticed that some zip files had single file but it was either misspelled e.g .Epub instead of .epub or it was a single .mobi file so It wasn’t getting extracted

checking if files had already been extracted

for i in $(ls *.zip)
do
	str=$(echo $i | cut -c1-33 )
	found=$(find out -iname "$str*")

	# in bash empty strings == false & found returns the filename if found e.g out/file-name.pdf or ''
	if ! [[ $found ]]
	then
		echo $i
	fi

done

Ended up switching the logic to check if pdf, epub files exist in the zip and extracting them to the correct folder

improved logic

for i in $(ls *.zip)
do
	haspdf=$(unzip -l $i | grep -o .pdf)
	hasepub=$(unzip -l $i | grep -o .epub)
	if [ $haspdf ]; then
		echo "pdf: "
		unzip $i -d out "*.pdf"
	elif [[ $hasepub ]]; then
		echo "epub:" 
		unzip $i -d out "*.epub"
	else
		echo "somme: "$i
		unzip $i -d out
	fi
done

end result

Last but not least rename the files, remove the (www.ebook-dl.com)

rename -v 's/\(.*\)//' ./*

The biggest take away from this is that you should test your scripts with a few files and always carry out sanity checks. Test each command because one bad command will ruin the whole pipe sequence.

links:

  1. fixing xargs error
  2. use zipinfo to read filecount
  3. fetching numbers in grep
  4. remove files after calling find
  5. shell scripting basics
  6. bash arithmetic
  7. bash cheatsheet
  8. unzip to particular directory
  9. unzip specific extensions only
  10. finding strings withing string
  11. cutting strings in bash
  12. mass rename files
  13. case insensitive grep

End of story