Monday, 15 April 2024

Either Excel or Time Machine or Apple Desktop Broke my Spreadsheets and then fixed one of them



This is scary - where's my data ?

*** The is a descriptive story of a data corruption event but does not as of April 2024 contain a definitive root cause analysis to the problem described. There is no fixing of the "data corruption" that appears. If you are having a similar problem, I suggest you get onto Apple support. ***

Background

The storage on modern computers allows us to save many thousands of files adding up to terabytes of data. However, it has been my experience working in both IT support and generally on computers over the years that only a few files are really really important to people and they're often not that big. I certainly fit within that category as two of my most important files are Excel spreadsheets that together weigh just over a megabyte. I've had these two spreadsheets for at least seven or eight years working across a variety of machines. I have both updated the machine I from a Mac powerbook to a new Mac mini to a new Mac studio and I've also updated Microsoft Office when requested from 2007 to the latest versions.

These spreadsheets have lasted the test of time quite well although I have occasionally had to recover them when they became corrupted - I kind of assumed that the corruption was Excel but now I see there may have been another cause. Such corruption has not occurred in quite a while and prior to this year the Mac mini and these spreadsheets have been stable for at least 18 months,

On about the 2nd of April I moved from a Mac mini M1 to a Mac studio M2. Using the time machine facility to restore my backups the entire environment move smoothly from the Mac mini to the Mac studio. There seem to be very few glitches or errors, though I did notice that Excel was not was not able to open, file names that it had recorded in its recent history list. Even though according to the file names alongside the entries the files did exist. Double clicking to open files from the Finder worked as expected. The suggested remedy for this was to delete the cache file in ~/library/blah blah blah. 

Technically how's the data stored ?

These files are important to me and they have a couple of layers of password encryption.

Two Excel documents in the format .XLSX.   Once called DoshNSav_piv and the other DoshNsav_Trackers. They are both multi sheet workbooks and have been saved in the Microsoft Excel password format. I have worked on these spreadsheets over many years and they contain pivot tables but no macro code.

Those two Excel files are held within a re-mountable .sparseimage disc image that also has a password. called FinanceDocs.sparseimage .
The disc image is held in the normal Mac file system, it's double clicked when I want to use the files which then mounted as though it was on external media. When finished I unmount the disc image file. This is known as encryption at rest, but it's fairly basic created using built-in utilities as provided by the standard Apple Application Disk Utility.


Time Machine 

Data needs to be backed up and as a long time user of Time machine I rely on it for backup services. It's not the only backup I do but these are automatic and regularly saved to an external NAS storage box. Time Machine does a synthetic back up creating a new disc image that contains the entire contents of the drive based on the previous entire contents of the drive. In essence it just saves the data that belongs to new files but retains pointers to the data for older files. When restoring data the backups can be accessed either through a GUI mechanism, where you look back through previous folder listings, or by browsing the historical disk images.  The crucial mechanism is that pulling a file from one of the backups should return the entire contents of that file and its attributes in the way that it was when the backup was run.

Here is a listing from the time machine backups showing each backup with date_timestamp
 

Opening one of the backup images shows the files backed up. The disk image file containing the spreadsheets is listed.




So what went wrong  ?

I decided to open the spreadsheets to do a couple of updates in the same way that I have done over many many years. After mounting the disk image, the spreadsheets would not open giving the error message out the top of this page. This was ugly according to the finder listings they both showed the type Microsoft XL workbook ( .XLSX ) I tried changing the file type (on a copy) to something else and back again but to no avail if anything it just broke the file contents even more. Tried opening them on another machine with a similar configuration, but that also failed with the same error message. This shows that the spreadsheets are corrupted. The fact that they would have been closed correctly ( no machine crashed to speak of) points towards data corruption.


What should be seen when opening a password encrypted Excel document. Top version when open in Excel, bottom version when using finder "Space bar to sample" view.



What's in the backups ?

As the contents of the Mac studio were entirely ported over from the previous Mac mini using the Time machine protocol for starting a new Apple machine I thought maybe they will have been some trouble during that conversion. I had a collection of backups of the previous machine as well as backups from this machine. Restoring, the disk images from previous backups worked reliably but when extracting the Excel spreadsheets from the disc image they failed to open. There were two sets of backups  one from the previous Mac mini that ran up to the end of March this year and the other from the new Mac studio. I extracted a set of the disc images from various backups rolling back over time. I extracted the Excel sheets from those disc images to get the listing as follows.



Clicking on each of the Excel spreadsheets quickly showed which ones worked and which ones didn't. It was entirely unclear on first glance as to why some of them would open and some didn't. For some reason some old skills kicked back in and I thought I'd try the UNIX command "file" to see if there was any difference in the file typing. 


clive@BBComp Special measures % file */*                      


2024-0105-225736/DoshNsav_Trackers_piv.xlsx:                                 CDFV2 Encrypted

2024-0105-225736/DoshNsav_pivZ.xlsx:                                         Composite Document File V2 Document, Cannot read short stream

2024-0105-225736/FinanceDocs_2024-0105-225736.sparseimage:                   data


2024-0201-071633/DoshNsav_Trackers_piv.xlsx:                                 CDFV2 Encrypted

2024-0201-071633/DoshNsav_pivZ.xlsx:                                         CDFV2 Encrypted

2024-0201-071633/FinanceDocs_2024-0201.sparseimage:                          data


2024-0303-085151/DoshNsav_Trackers_piv-2024-0303-08515_OK.xlsx:              Microsoft Excel 2007+

2024-0303-085151/DoshNsav_Trackers_piv.xlsx:                                 CDFV2 Encrypted

2024-0303-085151/DoshNsav_pivZ.xlsx:                                         CDFV2 Encrypted

2024-0303-085151/FinanceDocs_2024-0303-08515.sparseimage:                    data


2024-0314-153641/DoshNsav_Trackers_piv.xlsx:                                 CDFV2 Encrypted

2024-0314-153641/DoshNsav_pivZ.xlsx:                                         Apple Desktop Services Store

2024-0314-153641/FinanceDocs-2024-0314-153641.sparseimage:                   data


2024-0320-065313/DoshNsav_Trackers_piv.xlsx:                                 CDFV2 Encrypted

2024-0320-065313/DoshNsav_pivZ.xlsx:                                         Apple Desktop Services Store

2024-0320-065313/FinanceDocs_2024-0320-065313.sparseimage:                   data


2024-0324-011945/DoshNsav_Trackers_piv.xlsx:                                 Apple Desktop Services Store

2024-0324-011945/DoshNsav_pivZ.xlsx:                                         Apple Desktop Services Store

2024-0324-011945/FinanceDocs_2024-0324-011945.sparseimage:                   data


2024-0327-134121/DoshNsav_Trackers_piv.xlsx:                                 Apple Desktop Services Store

2024-0327-134121/DoshNsav_pivZ.xlsx:                                         CDFV2 Encrypted

2024-0327-134121/FinanceDocs-2024-0327-134121.sparseimage:                   data


2024-0331-214023/DoshNsav_Trackers_piv.xlsx:                                 Apple Desktop Services Store

2024-0331-214023/DoshNsav_pivZ.xlsx:                                         CDFV2 Encrypted

2024-0331-214023/FinanceDocs_2024-0331-214023.sparseimage:                   data


2024-0401-211118_MMlast/DoshNsav_Trackers_piv.xlsx:                          Apple Desktop Services Store

2024-0401-211118_MMlast/DoshNsav_pivZ.xlsx:                                  CDFV2 Encrypted

2024-0401-211118_MMlast/FinanceDocs-2024-0401-211118.sparseimage:            data


2024-0402-082024_BBCFirst/DoshNsav_Trackers_piv.xlsx:                        Apple Desktop Services Store

2024-0402-082024_BBCFirst/DoshNsav_pivZ.xlsx:                                CDFV2 Encrypted

2024-0402-082024_BBCFirst/FinanceDocs_2404-0402-082024_BBCFIRST.sparseimage: data


2024-0415-000021/DoshNsav_Trackers_piv.xlsx:                                 Apple Desktop Services Store

2024-0415-000021/DoshNsav_pivZ.xlsx:                                         CDFV2 Encrypted

2024-0415-000021/FinanceDocs_2024-0415-000021.sparseimage:                   data



These results were very useful. it turns out where the file is of the type "CDFV2 Encrypted " the file would open correctly and where it is marked with "Apple desktop services store" quotes opening gives the failure. During the entire historical usage of the spreadsheet I have never knowingly changed the type of the spreadsheet. It had always been an encrypted, saved with the password Excel spreadsheet. 

What is very peculiar and highlighted above how the files change types and then back again. See lines marked in yellow and red.  In the yellow lines, the file DoshNsav_pivz changes type between backups and then in the red lines, back again. Meanwhile the _Trackers file changes to Apple Desktop Services but not back again and remains inoperable.  The file with type "Microsoft Excel 2007+" is a version saved without a password.




I first thought that the Excel files have lost the marker that indicates they are encrypted. Excel then tries to open them as a plain file and obviously fails on some kind of structural integrity test. I could not think of any explanation as to why this marker would change and or change back during the process of doing a backup. This is a data integrity issue as files and their associated attributes should not be changed through the process of doing a backup and restore. There may have been macOS or Microsoft update in that time span.

Looking at the first readable data in each of the files shows as follows.
For a readable Excel file :

BBComp  % strings 2024-0402-082024_BBCFirst/DoshNsav_pivZ.xlsx| head

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>

<encryption xmlns="http://schemas.microsoft.com/office/2006/encryption" xmlns:p="http://schemas.microsoft.com/office/2006/keyEncryptor/password" xmlns:c="http://schemas.microsoft.com/office/2006/keyEncryptor/certificate"><keyData saltSize="16" blockSize="16" keyBits="128" hashSize="20" cipherAlgorithm="AES" cipherChaining="ChainingModeCBC" hashAlgorithm="SHA1" saltValue="cfDF3XFob

T&,9

"&Z#

ae+B

'd/#

[)ps

Yjx5<

Y!

  b

1Bi2


For one of the broken file: 


BBComp % strings 2024-0402-082024_BBCFirst/DoshNsav_Trackers_piv.xlsx| head 

Bud1

nIlocblob

gIlocblob

smodDdutc

sdsclbool

slsvCblob

Bbplist00

WXWY

    [XiconSize_

showIconPreview_

calculateAllSizesWcolumns_

BBComp % 



Further investigations

Running "Disk first aid" ( known as fsck in unix circles) on the disk image did give some indications that the may be a problem with the disk image container.


Disk First Aid showing errors within the directory structure that were repaired 

Disk First aid - a clean run


 But has not fixed the broken files.

 % file /Volumes/PoundsDocsClive/Dosh*

/Volumes/PoundsDocsClive/DoshNsav_GameOver.xlsx:      CDFV2 Encrypted

/Volumes/PoundsDocsClive/DoshNsav_May_2017_Copy.xlsx: CDFV2 Encrypted

/Volumes/PoundsDocsClive/DoshNsav_Trackers_piv.xlsx:  Apple Desktop Services Store

/Volumes/PoundsDocsClive/DoshNsav_old.xlsx:           CDFV2 Encrypted

/Volumes/PoundsDocsClive/DoshNsav_pivZ.xlsx:          Apple Desktop Services Store

/Volumes/PoundsDocsClive/Dosh_Nasdaq_Pivot.xlsx:      Microsoft Excel 2007+



But the damage may have been done by then . This is a very specific use case opening an extendable volume changing one file and closing it again. 
Such a working practice would involve extending the volume but not by very much, and redoing the encryption. It's my understanding that Time Machine then has to back up the whole of the disc image because of part of it has changed. Time Machine works on at the file level, not at the block, level.

Anyway, it's all rather annoying but I think I might have been ignoring this for awhile as you can see from the names of the files. Occasionally I've had to rebuild these two spreadsheets probably because of some minor corruption noted.

A quick look to see if I can spot any corruption within the Excel spreadsheet proved fruitful. Using the side-by-side graphical compare program provided with the Xcode kit we can see with the spreadsheet on the left side and the .DSstore on the right hand side that the spreadsheet has been over written with the first part of the store. The magic number for a DS store is the string Bud1pp which can be seen at the start of both files. This would also explain why the corrupted files are described as "Apple desktop services store." this is a classic case of file corruption where the contents of one file appear within the contents of another.

The mechanism of how this occurred remains to be determined.



Who needs to fix this ?


My original conclusion, before I discovered that the corruption was data from the .DSstore file fell between Apple and Microsoft. I have now concluded that this a a problem with the handling of disk images in the Apple infrastructure.

That's an interesting question and during my career in tech support I have been caught between vendors each of which say the problem belongs to the other one. For me there needs to be some kind of mechanism to see and or change the type category of the file to ensure that Excel will open them as an encrypted/password file if they are an encrypted/password file. I understand that Microsoft has used a number of different encryption schemes for their spreadsheets. Somehow coordination has been lost with the file type handling in time machine.

Personally I would would love to help but I bet if I phoned up either tech support team they would say please send over your files and the passwords used and be honest because this is personal financial information that's not gonna happen. The other way is to re-create the problem, but I'm sure that would take quite a while to find the edge condition in historical backups that is causing this problem.

If you think you see this same problem feel free to use the strings | head -10  command in the Terminal app to see if you have the same symptoms. Bud1pp  showing in the corrupted file. Then put in a support case in with Apple. Let me know in the comments below if either you've seen this problem or if you have a fix or more information about it.

In Summary
  • This is a classic case of data corruption at the file level. The data at the start of the Excel file has been over written by data that should be in the .DS_Store file.
  • It's very hard to spot when such data corruption has occurred as there is no external visible marker (except in this case the unix file type). The problem only becomes apparent when the Excel file is opened.
  • I suspect, but I cannot prove that the file did not self repair or become unbroken but that I recovered it from back up after sensing some corruption within the file.
  • Some file types are identified by the dot3 or four letters on the end others are identified by data markers within the file. Excel appears to use a combination of both to identify whether a file has a password encryption or not. In this case, corrupted contents of a spreadsheet prevented from being opened.
  • What's needed now is a script that reliably recreates the issue.
  • I logged this as a support case with Apple but unless I can recreate the issue easily - not much chance it will progress being that it occurred on previous machine.
  • AAAARGH lucky I have backups and know how to use them.
  • Read more about .DS_Store files and some of the problems they cause here.  















.





No comments: