Tags
All Data, Analytics, aspiring tendencies in IM, awareness, good start, Good Strat, goodstart, Martyn Jones, Strategy
If you know all about Sentiment Analysis, you’ve come to the right place. Because I don’t have a clue if what I know about it is accurate or not.
I started to do a bit research into this Sentiment Analysis lark, in particular with the theoretical idea of using it to analyse and draw conclusions from comments on Pulse – assuming that this is what it can be used for.
To begin at the beginning, which is good place to start, I read the piece on Wikipedia, and this was how it began:
“Sentiment analysis (also known as opinion mining) refers to the use of natural language processing, text analysis and computational linguistics to identify and extract subjective information in source materials.
Generally speaking, sentiment analysis aims to determine the attitude of a speaker or a writer with respect to some topic or the overall contextual polarity of a document. The attitude may be his or her judgment or evaluation (see appraisal theory), affective state (that is to say, the emotional state of the author when writing), or the intended emotional communication (that is to say, the emotional effect the author wishes to have on the reader).” Source: Wikipedia Link:http://en.wikipedia.org/wiki/Sentiment_analysis
Well, that’s a fairly intuitive description. I could have almost have guessed as much.
But, back to the aim of analysing sentiment in Pulse comments, where to start and what to do.
What would sentiment analysis make of these:
On the death of an IT-business celebrity. What would sentiment analysis make of the very emotive comments of desolation, sadness and poignancy of people who didn’t personally know the departed, even remotely, or maybe didn’t even know of them until after they had ‘shuffled off life’s mortal coil’? How would that work? What would sentiment analysis make of the maudlin aphorisms, surrogate grief and bizarre sorrow of people separated by more degrees than Kofi Anan and Mork from Ork. What additional insight does sentiment analysis tell us when these comments are analysed along with the body of the text and other comments that triggers these comments?
In a similar vein, how does sentiment analysis catch instances of sycophancy? Especially considering the fact that some of it is so ‘in your face’ and blatant that it often times seems to be a bad parody of a bad parody. “Oh, Ricky, why are you such a sexy brainbox?” How does it work in those situations?
Worse than that is the preening, gushing and obtuse texts of massive, errm… fabulators[i]. If it wasn’t about Big Data or Strategy or IT, it would be about something else, usually about the writer themselves. “I give Rafa and Rodge tips on tennis! I went to the University of the Universe and got a first! I challenged Superman to a race, and won! I have read the entire works of Dan Brown, 25 times…Neeeh!” What would sentiment analysis do with that sort of gold?
Also, what does sentiment analysis do with texts so ambiguously daft that they could mean anything? Okay, it might be able to pick up a few trigger words here or there, “rubbish”, “of”, “load”, “a”, “what”, etc. However, how does it know when “excellent” is being used in a way that means anything but excellent? For example, “Excellent Big Data job there”, with the silent “if you want a job doing properly then do it yourself”.
Finally, for the purpose of this little piece, what would sentiment analysis do with term abuse, if it could actually identify it? Going back to the use of the terms such as Big Data or Strategy, how can sentiment analysis discern between the dopey and wrong-headed use of the term, and when it is actually being used in a coherent, cohesive and consistent way, in line more or less with its formal definition? I suppose we can always write a mountain of rules to help us out:
If topic in focus of piece is strategy
And context of topic is business
And author of piece is Richard Rumelt
Then the credibility of text is good (with a certainty of 100%)
But you and try and maintain a rule base with isntances like that. It soon becomes a management nightmare.
Alternatively, maybe it could be used to analyse this text. It’ll have its work cut out, that’s for sure. Does sentiment analysis do sarcasm and cynicsm?
Anyway! I bet you might know how this sentiment analysis works, don’t you? On the other hand, if not, then it will be someone else who ‘knows’. But of course, all will not be revealed, because it’s a secret so powerful, that in the wrong hands it could be used to dominate the entire galaxy.
Only joking; and many thanks for reading.
[i]To engage in the composition of fables or stories, especially those featuring a strong element of fantasy: “a land which … had given itself up to dreaming, to fabulating, to tale-telling” (Lawrence Durrell).
lang: en_US
Subject Oriented: Operational databases, such as order processing and payroll databases and ERP databases, are organized around business processes or functional areas. These databases grew out of the applications they served. Thus, the data was relative to the order processing application or the payroll application. Data on a particular subject, such as products or employees, was maintained separately (and usually inconsistently) in a number of different databases. In contrast, a data warehouse is organized around subjects. This subject orientation presents the data in a much easier-to-understand format for end users and non-IT business analysts.
Integrated: Integration of data within a warehouse is accomplished by making the data consistent in format, naming and other aspects. Operational databases, for historic reasons, often have major inconsistencies in data representation. For example, a set of operational databases may represent “male” and “female” by using codes such as “m” and “f”, by “1” and “2”, or by “b” and “g”. Often, the inconsistencies are more complex and subtle. In a Data Warehouse, on the other hand, data is always maintained in a consistent fashion.
Time Variant: Data warehouses are time variant in the sense that they maintain both historical and (nearly) current data. Operational databases, in contrast, contain only the most current, up-to-date data values. Furthermore, they generally maintain this information for no more than a year (and often much less). In contrast, data warehouses contain data that is generally loaded from the operational databases daily, weekly, or monthly, which is then typically maintained for a period of 3 to 10 years. This is a major difference between the two types of environments.
“Data Doghouse, meet Pig Data.”



36大数据专稿,原文作者:Martyn Jones 本文由1号店-欧显东编译向36大数据投稿,并授权36大数据独家发布。转载必须获得本站及作者的同意,拒绝任何不标明作者及来源的转载!




It’s not about big
It’s not about variety
It’s not even about velocity
It’s not about the manageability of Big Data
The new analytics aren’t new
And the value is questionable
What we’ve been told
It’s not about big

