Writing and Data Analysis Tips n' Tricks¶
Here is some good advice for data analysis.
How can I do the analysis for the big dataset?
Try using a classical method for big-data matrices is principle component analysis (PCA) using an eigen value 1 cutoff, followed by with Varimax rotation -- many stat programs do this as a one-shot option. The intent is to get some idea of the number of component dimensions, their relative magnitudes and perhaps some understanding of their nature. Two of my early mentors [from different traditions], did this routinely with scans for extreme values and variables that had near unity, if not unity correlations -- as a check for inadvertent redundancies and gross errors in the input variables. It was often quite enlightening to see hidden in data gross values that were ignored when a previous analyst had jumped into a complex learning algorithm etc with impossible values and/or ..the off inadvertent redundancies clobbered the reaching of a stable result [after profound computer time costs]. Likewise, they often unexpectedly found that one or two components acounted for virtually all the relationships in the data (e.g,95% variation on a single component in one case and virtually no commonality (component variation on a primary variable of interest [both cases went unpublished due to the political embarrassment of months chasing respective data sets.] Start by scanning the data -- perhaps using a classical PCA -- and screenings mentioned above -- as a first step.