The MetaData class is designed to be high level interface to all the underlying meta data stored within different sections, themselves within different property set streams.
With this class, you can simply get properties using their names, without needing to know about the underlying guids, property ids etc.
Example:
Ole::Storage.open('test.doc') { |ole| p ole.meta_data.doc_author }
TODO:
add write support
fix some of the missing type coercion (eg FileTime)
maybe add back the ability to access individual property sets as a unit
directly. ie ole.summary_information
. Is this useful?
full key support, for unknown keys, like ole.meta_data[myguid,
myid]
. probably needed for user-defined properties too.
# File lib/ole/storage/meta_data.rb, line 63 def initialize ole @ole = ole end
# File lib/ole/storage/meta_data.rb, line 115 def [] key pair = Types::PropertySet::PROPERTY_MAP[key.to_s] or return nil file = FILE_MAP[pair.first] or return nil dirent = @ole.root[file] or return nil dirent.open { |io| return Types::PropertySet.new(io)[key] } end
# File lib/ole/storage/meta_data.rb, line 122 def []= key, value raise NotImplementedError, 'meta data writes not implemented' end
# File lib/ole/storage/meta_data.rb, line 126 def each(&block) FILE_MAP.values.each do |file| dirent = @ole.root[file] or next dirent.open { |io| Types::PropertySet.new(io).each(&block) } end end
# File lib/ole/storage/meta_data.rb, line 93 def file_format comp_obj[:file_format] end
# File lib/ole/storage/meta_data.rb, line 137 def method_missing name, *args, &block return super unless args.empty? pair = Types::PropertySet::PROPERTY_MAP[name.to_s] or return super self[name] end
# File lib/ole/storage/meta_data.rb, line 97 def mime_type # based on the CompObj stream contents type = FORMAT_MAP[file_format] return MIME_TYPES[type] if type # based on the root clsid type = CLSID_MAP[Types::Clsid.load(@ole.root.clsid)] return MIME_TYPES[type] if type # fallback to heuristics has_file = Hash[*@ole.root.children.map { |d| [d.name.downcase, true] }.flatten] return MIME_TYPES[:msg] if has_file['__nameid_version1.0'] or has_file['__properties_version1.0'] return MIME_TYPES[:doc] if has_file['worddocument'] or has_file['document'] return MIME_TYPES[:xls] if has_file['workbook'] or has_file['book'] MIME_TYPES[nil] end
# File lib/ole/storage/meta_data.rb, line 133 def to_h inject({}) { |hash, (name, value)| hash.update name.to_sym => value } end
i'm thinking of making #file_format and #mime_type available through #[], each, and to_h also, as calculated meta data (not assignable)
# File lib/ole/storage/meta_data.rb, line 70 def comp_obj return {} unless dirent = @ole.root["\0001CompObj"] data = dirent.read # see - https://gnunet.org/svn/Extractor/doc/StarWrite_File_Format.html # compobj_version: 0x0001 # byte_order: 0xffe # windows_version: 0x00000a03 (win31 apparently) # marker: 0xffffffff compobj_version, byte_order, windows_version, marker, clsid = data.unpack("vvVVa#{Types::Clsid::SIZE}") strings = [] i = 28 while i < data.length len = data[i, 4].unpack('V').first i += 4 strings << data[i, len - 1] i += len end # in the unknown chunk, you usually see something like 'Word.Document.6' {:username => strings[0], :file_format => strings[1], :unknown => strings[2..-1]} end