Recreating Adobe Bridge Keywords.xml file from Image metadata

I haven’t used Adobe Bridge for quite a long time, and when I came to use it recently I found that all my hierarchical keywords had been lost. This was because I’ve had a new computer since last using it, and the location where Bridge stores the hierarchical keywords is in a location I hadn’t been backing up.

All the hierarchical keywords that had been assigned to images were still in the metadata of those images, just they weren’t in the list Bridge keeps to assign to new images. So I needed some way of pulling the hierarchical keywords from all the images and then using this to rebuild the list of keywords available in Bridge.

In this post I’ll go over how I did this. Note all my processing is done in a linux environment, but you could probably adjust my code to Windows equivalents or run it from wsl. Also note, this is not a particularly robust solution, but rather one that was ‘good enough’ for me.

Extracting the metadata

First off, we need some way of extracting the hierarchical keyword data from the metadata of the images. Exiftool is great for this. There were a few other metadata tags I needed to rebuild lists for in Bridge, so I extracted these at the same time.

exiftool -config exiftool-config.txt -r -sep '#MultipleValueSeparator#' -XMP-lr:HierarchicalSubject -LensMfrMdl -XMP-dk:WebsiteName -XMP-dk:Software -XMP-dk:AdditionalLenses -XMP-dk:Filters -s3 -fast -W+ %t.txt ../photos/

Explanation of this command:

-config exiftool-config.txt
Use our custom config file in which we define the composite tag LensMfrMdl. This was just for some of the other information I wanted to build a list for, nothing to do with the Hierarchical keywords.
-r
Recurse through subdirectories
-sep '#MultipleValueSeparator#'
Separator to use where a value is an array. You can’t use a newline, so we have to use a placeholder we can replace later with a newline.
-s3
Output tag values only
-W+
Write each tag value to a file, append
%t.txt
Write the value to {tagname}.txt
-E
Output entities. Not actually used here, but depending on how you convert the output to the Hierarchical Keywords list for Bridge later, you may want this.

Note if you have a lot of images you may want to batch Exiftool to run through, say, 1000 images at a time. I didn’t look into how to do this but did read that exiftool can run slower the more images it processes. I just let it run for a few hours until it finished.

After running this you end up with as many files as the tags you told it to extract. So in my case I ended up with HierarchicalSubject.txt, LensMfrMdl.txt, Software.txt, AdditionalLenses.txt, and Filters.txt. No WebsiteName.txt as none of the images must’ve had that metadata.

Cleaning the data

The next step is removing all the duplicate values and sorting the list(s) we extracted:

cat HierarchicalSubject.txt | sed 's/#MultipleValueSeparator#/\n/g' | sort | uniq > HierarchicalSubject-unique.txt

Here we are using sed to replace our separator we specified earlier when extracting the metadata with a newline. This ensures that images that have multiple Hierarchical Subject values have them separated out.

We then sort the list, and finally filter out the duplicate values and save the result as HierarchicalSubject-unique.txt.

In my case, I repeated this for each of the tag files I had.

Building the Hierarchical Keyword list

The final stage is taking our file and converting it / adding into the "Adobe Bridge Keywords.xml" file. Now, I’ll cover how I did this below, but I think this would be something much better done in a language that can make use of an XML library for traversing the XML, creating nodes etc.

The other thing to note is that my code does not do any escaping of the values when trying to find them via xpath. In my case this meant some hierarchical keywords that started with * got missed out, but I didn’t want those anyway (they were probably picked up from some stock images).

Copy the Adobe Bridge Keywords.xml file from (in Windows) %appdata%\Adobe\Bridge 2024\ to your working directory where you have the extracted keyword file, then run:

BRIDGEKEYWORDSFILE='Adobe Bridge Keywords.xml'
IMAGESKEYWORDSFILE='HierarchicalSubject-unique.txt'
FILELINENO=0
while read -r line; do
	echo processing line $((++FILELINENO)) "$line"
	# If a higher depth of the current hierarchical category exists, then we can skip adding the current record since it will be created when the higher depth one is added
	if grep -Fq "$line|" "$IMAGESKEYWORDSFILE"; then echo "skipping as higher depth exists: $line"; continue; fi
	# Split the string into tokens based on the | separator
	IFS='|' read -ra KEYWORDS <<< "$line"
	# Get the count minus 1 of the keywords
	LEN=$((${#KEYWORDS[@]} -1))
	# If there was only one keyword (i.e. it is not really hierarchical), then we can skip it
	if [[ $LEN -lt 1 ]]; then echo "skipping $line"; continue; fi
	# Set the root of the xpath we need to add the keywords to
	XMLPATH='/keywords'
	# Loop through the keywords except the final one, adding them to the XML
	for ((i=0; i<$LEN; i++)); do
		# Check if the path to the current keyword already exists (-z means test if the command has zero length output)
		if [[ -z $(xmlstarlet sel -t -v "$XMLPATH/set[@name=\"${KEYWORDS[$i]}\"]/@name" "$BRIDGEKEYWORDSFILE") ]]; then
			# If it didn't exist, then create it. We have to create the element first and then set its name attribute.
			xmlstarlet ed --inplace -s "$XMLPATH" -t elem -n 'set' -i "$XMLPATH/set[not(@name)]" -t attr -n 'name' -v "${KEYWORDS[$i]}" "$BRIDGEKEYWORDSFILE"
		fi
		# Update our xpath with the node we've just added
		XMLPATH="$XMLPATH/set[@name=\"${KEYWORDS[$i]}\"]"
	done
	# Finally add an <item> with the last keyword 
	xmlstarlet ed --inplace -s "$XMLPATH" -t elem -n 'item' -i "$XMLPATH/item[not(@name)]" -t attr -n 'name' -v "${KEYWORDS[$LEN]}" "$BRIDGEKEYWORDSFILE"
done < "$IMAGESKEYWORDSFILE"

We loop through the extracted (and cleaned) keywords file, using xmlstarlet to check if the hierarchical keyword path exists in the Bridge Keywords XML file, and create it if not.

Finally the last stage was just copying the adjusted ‘Adobe Bridge Keywords.xml’ file back to %appdata%\Adobe\Bridge 2024\. Though in my case I actually have %appdata%\Adobe\Bridge 2024\ symlinked to a different directory now that does get backed up. Then open Bridge and you should see your hierarchical keywords displayed in the Keywords panel. I then went through and deleted quite a few ones that had just been picked up from Stock photos.

You could also do something like add all non-hierarchical keywords to the ‘Other Keywords’ section, rather than skipping them, but I’m not bothered about those.

For the other tags I extracted, these needed to be in JSON format. I just manually formatted these or used CONCAT in a spreadsheet on the values to get the needed JSON values.

Anyway, hopefully this is somewhat helpful if you have the same problem where you need to regenerate the Adobe Bridge hierarchal keywords file.

Posted on by xoogu, last updated

Leave a Reply