Overview
API calls in both JSON and curl are included in this pane.
Spinn3r provides APIs for social media, weblogs, news, video, and live web content to our customers in any language and in large volumes.
We provide three products main APIs for accessing this content, as well as a number of other secondary APIs.
Search API
Our full-text search API is based on Elasticsearch and provides advanced search facilities on top of a high quality content index.
If you’re getting up and running with Spinn3r for the first time you probably want to be using our Search API.
This API allows you to search for arbitrary text strings, search with complex boolean logic, use filters, and other advanced features like aggregations. The results are then returned as ordinary JSON documents.
Classifier API
The Classifier API allows developers to submit text (or URLs) and provide labels for this content based on our machine learning platform. For example, if you submit a new story about the US Presidential election you would get back labels for the candidates or other topics representing that article.
Parser API
The Parser API provides ad hoc parsing and metadata handling of arbitrary URLs on the web. Additionally, we perform data augmentation of the metadata including gender detection, sentiment detection, etc.
Firehose API
Our Firehose API is designed for bulk access to massive amounts of content. On the order of 200-500GB per day. We support some basic filtering on the content but this would still generate a lot of data for new applications.
If you’re just getting started it might make sense to use the full-text search API and then graduate to the firehose API once you’re indexing more than 10M posts per day.
Support
support@spinn3r.com
If you have any questions on this document, you can send inquiries to support@spinn3r.com or create a ticket by visiting our website.
It’s our goal to get you up and running ASAP and sometimes text documents may not explain all issues.
Authentication
Run a basic full-text search now which sill return a single document using your credentials
curl -XPOST 'http://{{vendor}}.elasticsearch.spinn3r.com/content*/_search' -H "X-vendor: {{vendor}}" -H "X-vendor-auth: {{vendor_auth}}" -d '{
"size": 1,
"query" : {
"term" : { "main" : "firefox" }
}
}'
Spinn3r uses simple HTTP headers for authentication in all our APIS.
When a new account is created we send your credentials in your account creation email.
Your authentication headers are:
header | value |
---|---|
X-vendor | {{vendor}} |
X-vendor-auth | {{vendor_auth}} |
If you’re keys aren’t shown please login (or register) to Spinn3r for new keys.
Authentication is performed via HTTP headers provided in the request.
When provisioned you’ll be given a vendor code and authentication code that need to be specified for each request.
Failed authentication
We include a JSON body in the response with a human readable message string on authentication failure:
{
"success" : false,
"message" : "Please check your vendor code. It may be invalid. Contact support@spinn3r.com if you would like a new one."
}
If HTTP authentication fails we will return either:
HTTP 401 Unauthorized
when the vendor and/or vendor auth are incorrectHTTP 402 Payment Required
when your payment isn’t up to date and your account has expired.
Connectivity
High throughput access to Spinn3r is required for your application to be performance.
This applies to our firehose API that can fetch hundreds of gigabytes per day but also to other APIs including search. A single request might be small but we can achieve a 2-3x performance boost by configuring our network properly.
Speed Test
To run the speed test just run:
wget --output-document=/dev/null http://api.artemis.spinn3r.com/speed-test
Outputs:
--2014-09-13 21:44:16-- http://api.artemis.spinn3r.com/speed-test
Resolving api.artemis.spinn3r.com (api.artemis.spinn3r.com)... 108.168.183.21
Connecting to api.artemis.spinn3r.com (api.artemis.spinn3r.com)|108.168.183.21|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 819200000 (781M)
Saving to: `speed-test'
100%[==========================================================>] 819,200,000 101.5M/s in 66s
2014-09-13 21:45:21 (101.9 MB/s) - `speed-test' saved [819200000/819200000]
Having a reliable network connection to our datacenters is critical to solid API performance.
Fortunately, we have a speed test URL that you can use to quickly measure your connection speed without having to worry about the latency of database or API calls.
Required Throughput
In this example, we’re indexing at about 100MB/s … or 800Mbit per second.
When using the firehose API, this will allow you to keep up with data in real time, but catch up quickly should your client fall behind.
For the search API this will allow you to retrieve documents much faster.
Request latency
You can measure the datacenter latency by running the following query which will measure to “TTFB” or time to first byte of the connection:
curl -o /dev/null -w "Connect: %{time_connect} TTFB: %{time_starttransfer} Total time: %{time_total} \n" http://api.artemis.spinn3r.com/speed-test
Request latency depends on your datacenter location.
If you want to improve performance for full-text searches you can use persistent HTTP connections to keep your connections active to the server.
Improving throughput
On Linux you can run the following to increase the TCP buffer size, please
cat
the files before hand to record the default values.
cat /proc/sys/net/ipv4/tcp_rmem
cat /proc/sys/net/ipv4/tcp_wmem
echo "16384 1048576 33554432" > /proc/sys/net/ipv4/tcp_rmem
echo "16384 1048576 33554432" > /proc/sys/net/ipv4/tcp_wmem
If you’re not receiving optimal throughput, we suggest increasing the OS TCP buffer size.
This is needed if your datacenter is far from our datacenter. If TCP drops a packet, or the reordering of the packets is too high, TCP has to slow down or data corruption will result.
Content
{
"bucket" : 0,
"resource" : "http://cnn.com/2014/10/15/health/texas-ebola-outbreak",
"date_found" : "2014-06-22T01:08:52Z",
"index_method" : "PERMALINK_TASK",
"html" : "<html><body><p>Full HTML of the content</p></body></html>",
"html_length" : 57,
"source_hashcode" : "COH0cFU4G1sMlRHd9gEvS-n3FFI",
"source_resource" : "http://cnn.com",
"source_link" : "http://cnn.com",
"source_publisher_type" : "MAINSTREAM_NEWS",
"source_publisher_subtype" : "MAINSTREAM_NEWS",
"source_date_found" : "2014-06-22T01:08:52Z",
"source_update_interval" : 900000,
"source_setting_update_strategy" : "CYCLICAL",
"source_setting_index_strategy" : "DEFAULT",
"source_title" : "CNN",
"permalink" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
"canonical" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
"main" : "<p>Full HTML of the content</p>",
"main_length" : 31,
"main_checksum" : "I7QyvW_g9AjGg3vWjmcxwo7wjXs",
"main_format" : "HTML",
"summary_text" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
"title" : "CDC: Nurse with Ebola should not have traveled",
"publisher" : "CNN",
"section" : "Technology",
"description" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
"tags" : [ "nurse", "outbreak", "ebola" ],
"published" : "2014-06-22T01:08:52Z",
"author_name" : "Holly Yan",
"author_link" : "https://twitter.com/HollyYanCNN",
"lang" : "en"
}
Nearly all Spinn3r APIs strive to produce the same schema and core set of fields. This includes the search, firehose, and parser APIs.
Core fields
The schema returned by Spinn3r has a large number of fields produced by our indexing system.
This ranges from basic fields like title, link, description, and article body, to author information, and all the way to NLP analysis including near duplicates, gender, language, etc.
You may want to review our schema for the full list of fields.
Basic fields
All posts will have a permalink
. Most will have a title
except
for MICROBLOG
and some PHOTO
publisher types.
If a summary_text
is available you may wish to use this or displaying
content along with optional extract
or main
content.
You may want to review our content cards for how to display content in a UI.
Content Cards
Here are some basic examples for displaying content. How this is accomplished in practice is entirely up to your web designer.
SUMMARY:
SUMMARY_LARGE_IMAGE
PLAYER
Spinn3r includes a number of text fields that include the full post of the content. This can be used to build out full-text search applications or build NLP models for classification of content.
However, It’s not always ideal for presenting to users.
The full HTML is generally too long for displaying more than 1 or 2 posts within a web application.
It’s unclear where to start displaying content. Usually one would want to build a summary of the content but this is easier said than done (there’s a whole branch of machine learning dedicated to document summarization)
Content cards can help with this issue.
Spinn3r supports ‘cards’ which allow you to present content to your user in a rich format. We properly handle extracting image , video, and text metadata so you don’t have to.
Fields
card
The card
field specifies how you can include content within your web
application.
We support the following types of cards:
name | description |
---|---|
SUMMARY | Basic summary of the content with optional image |
SUMMARY_LARGE_IMAGE | Basic summary of the content using a large image |
PHOTO | The content is a photo and is the primary content |
GALLERY | The content is a photo gallery with multiple images |
PLAYER | The content is an embedded video player |
Main fields
The title, permalink, description fields provide the main content for a card. These will provide the core backbone of your content. When the card field is is set it can be assumed that the title and description will ALWAYS be available for displaying to the user.
Image
When set, this is an image representing the content. It can optionally have an image_width and image_height.
Player
When set, this is a video URL designed to be used within an iframe. This only applies when the card is set to PLAYER
Shared Content
Many social networks and blog publishing platforms support the concept of 'shared content’ whereby a user can easily share a piece content with their followers if they deem it noteworthy.
Spinn3r supports this by flagging content with a shared
field which is
true when the content is a piece of shared content.
Additionally, we add a few more fields including:
shared_author_link
shared_author_name
shared_author_user_id
shared_identifier
shared_permalink
shared_author_handle
See the documentation for the content schema for these fields.
Note that if you would like to use our search API to find nested/shared content
this is easily accomplished by taking an identifier
and then search for the
shared_identifier
to fetch everyone who has shared an article.
Additionally. no like
or share
values are present on shared content
as this content can’t itself be liked or shared.
Search
Here’s an example of searching for the term ‘Obama’ using a query string.
We provide this is a rawcurl
command for ease of use. The examples after this will useJSON
. You can simply use this curl command as a template to execute the examples individually.
curl -XPOST 'http://{{vendor}}.elasticsearch.spinn3r.com/content_*/_search?pretty=true' \
-H "X-vendor: {{vendor}}" \
-H "X-vendor-auth: {{vendor_auth}}" \
-d '{
"size": 1,
"query": {
"query_string" : {
"query" : "main:Obama"
}
}
}
'
JSON result for this query. You can see the full schema for the resulting JSON in our content schema.
{
"bucket" : 1459618800097,
"sequence" : 1459618891000015420,
"sequence_range" : 9297,
"hashcode" : "cry8h2SbQBlpllWRnDo1NBwgBJE",
"resource" : "http://politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
"date_found" : "2016-04-02T17:41:31Z",
"index_method" : "PERMALINK_TASK",
"detection_method" : "SOURCE",
"version" : "5.1.684",
"source_hashcode" : "bFiShGib138jwIeU8ryJQWH7CFA",
"source_resource" : "http://politico.com",
"source_link" : "http://www.politico.com/",
"source_publisher_type" : "MAINSTREAM_NEWS",
"source_date_found" : "2016-03-13T04:44:59Z",
"source_last_updated" : "2016-04-02T17:26:28Z",
"source_last_published" : "2016-04-02T17:41:28Z",
"source_last_posted" : "2016-04-02T17:11:28Z",
"source_update_interval" : 900000,
"source_http_status" : 200,
"source_content_length" : 244718,
"source_content_checksum" : "cQrtB2o9SVvDK1TACcGvSDKQ8tw",
"source_assigned_tags" : [ "#7ZdROb4JaZaz4uGuG_9csN3QDlg" ],
"source_setting_update_strategy" : "CYCLICAL",
"source_setting_index_strategy" : "DEFAULT",
"source_title" : "@politico",
"source_description" : "POLITICO covers political news with a focus on national politics, Congress, Capitol Hill, lobbying, advocacy, and more. POLITICO's in-depth coverage includes video features, regular blogs, photo galleries, cartoons, and political forums.",
"source_feed_href" : "http://www.politico.com/rss/politicopicks.xml",
"source_feed_title" : "POLITICO - TOP Stories",
"source_feed_format" : "RSS",
"permalink" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
"permalink_redirect" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
"permalink_redirect_domain" : "politico.com",
"permalink_redirect_site" : "politico.com",
"canonical" : "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545",
"domain" : "politico.com",
"site" : "politico.com",
"main" : "<div> \n <div> \n <div> \n <section> \n <div> \n <div> \n </div> \n </div> \n </section> \n </div> \n </div> \n</div> \n<div> \n <div> \n <div> \n <section> \n <section> \n <div> \n <ul> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349\"><img alt=\"160329_ted_cruz_ap_1160.jpg\" title=\"160329_ted_cruz_ap_1160.jpg\" /></a>\n </div> \n <div> \n <h3> <a href=\"http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349\">Ted Cruz’s evangelical problem </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\"><img alt=\"160330_donald_trump_gty_1160.jpg\" title=\"160330_donald_trump_gty_1160.jpg\" /></a>\n </div> \n <div> \n <h3> <a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\">How the North Dakota GOP is freezing out Trump</a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784\"><img alt=\"160401.jpg\" title=\"160401.jpg\" /></a>\n </div> \n <div> \n <h3> <a href=\"http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784\">9/11: What Would Trump Do? </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486\"><img alt=\"GettyImages-518602178.jpg\" title=\"GettyImages-518602178.jpg\" /></a>\n </div> \n <div> \n <h3> <a href=\"http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486\">Obama goes radioactive on Trump</a></h3> \n </div> \n </article> </li> \n <li> \n <div> \n </div> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\"><img alt=\"160401_hillary_clinton_ap_1160.jpg\" title=\"160401_hillary_clinton_ap_1160.jpg\" /></a>\n </div> \n <div> \n <h3> <a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\">Sanders gets under Clinton's skin in New York</a></h3> \n </div> \n </article> </li> \n </ul> \n </div> \n </section> \n </section> \n </div> \n </div> \n</div> \n<div> \n <div> \n <div> \n <section> \n <div> \n <div> \n <div>\n <div> \n <div> \n <a href=\"http://www.politico.com/playbook\"><img alt=\"Playbook\" /></a>\n </div> \n <div> \n <div> \n <p> Politico</p> \n <h2> <a href=\"http://www.politico.com/playbook\">Playbook</a></h2> \n <p>Mike Allen's must-read briefing on what's driving the day in Washington </p> \n </div> \n <div> \n <b></b> Subscribe \n </div> \n </div> \n </div> \n </div>\n </div> \n </div> \n </section> \n </div> \n </div> \n</div> \n<div> \n <div> \n <article> \n <section> \n <div> \n <div> \n <div> \n <aside> \n <ul> \n <li> Shares </li> \n <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/facebook/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com\"> <b></b> Facebook </a> </li> \n <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/twitter/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&text=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster\"> <b></b> Twitter </a> </li> \n <li> <a href=\"http://api.addthis.com/oexchange/0.8/forward/googleplus/offer?pco=tbxnj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&title=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster\"> <b></b> Google + </a> </li> \n </ul> \n <ul> \n <li> <a href=\"mailto:?subject=POLITICO Playbook, presented by the Embassy of the United Arab Emirates: BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster&body=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\"> <b></b> Email </a> </li> \n <li> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#superComments\"> <b></b> Comment </a> </li> \n <li> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#\"> <b></b> Print </a> </li> \n </ul>\n </aside> \n </div> \n </div> \n </div> \n </section> \n <section> \n <div> \n <div> \n <div></div> \n <div> \n <div> \n <h1>BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</h1> \n <p>04/02/16 01:32 PM EDT</p> \n </div> \n </div> \n <p><b>By Mike Allen </b>(@mikeallen; <a href=\"mailto:mallen@politico.com\">mallen@politico.com</a>)<b> and Daniel Lippman </b>(@dlippman; <a href=\"mailto:dlippman@politico.com\">dlippman@politico.com</a>)</p> \n <p><b>Happy Saturday! </b>Obama alumnus Brent Colburn asked that we include this in celebration of his birthday: Today “is World Autism Awareness Day, and in celebration of my nephew Cordis and his amazing parents, Brian & Andrea Colburn, I’d like to encourage every Playbooker to take a minute and learn more about ... the autism community.” <a href=\"http://www.autismspeaks.org\">www.autismspeaks.org</a></p>\n <p>Story Continued Below</p> \n <div> \n <div> \n <div> \n </div> \n </div> \n </div> \n <p><b>INSIDE THE CAMPAIGNS – “Trump campaign shrinks Lewandowski’s role: </b>Despite the billionaire’s staunch defense, his embattled campaign manager is losing clout,” by Ben Schreckinger and Ken Vogel, with Hadas Gold: “Trump’s just-named convention manager, Paul Manafort, is expected to take a leading role not just in the selection of delegates, but in the remaining primaries themselves. ... [A] person involved in Trump’s campaign [said:] ... ‘Mr. Trump’s listening to other people now. The crew’s expanding.’ ... </p> \n <p><b>“[T]his winter,</b> ... National Political Director Michael Glassner ... was [promoted] to deputy campaign manager ... On March 2, the campaign promoted Stuart Jolly ... to national field director, giving him primary authority over ... hiring ... field staff. ... <b>Manafort has quickly taken charge of his own fiefdom in Washington, and is planning to hire a team of his own,</b> which is likely to include several veterans of the 1976 Republican National Convention – the party’s last convention at which the presidential nomination was contested.” <a href=\"http://politi.co/22YORlU\">http://politi.co/22YORlU</a></p> \n <p><b>**SUBSCRIBE to Playbook</b>: <a href=\"http://politi.co/1M75UbX\">http://politi.co/1M75UbX</a></p> \n <p><b>CHASER – “Trump touts his loyalty in defending campaign manager</b>,” by AP’s Jill Colvin in Appleton, Wis.: “Trump [said in a phoner Thu. evening that] his decision to stand behind his campaign manager ... is a sign of loyalty — a trait that Trump has displayed, for better or worse, through much of his career.” <a href=\"http://apne.ws/1RS7xLX\">http://apne.ws/1RS7xLX</a> </p> \n <p><b>ALEX BURNS, who turns 3-0 tomorrow, </b>coins a memorable phrase on N.Y. Times p. A9, “G.O.P. Fears Trump as <b>Zombie Candidate: Damaged but Unstoppable”:</b> “Republicans who once worried that Mr. Trump might gain overwhelming momentum ... are now becoming preoccupied with a different grim prospect: that Mr. Trump might become a kind of zombie candidate — damaged beyond the point of repair, but too late for any of his rivals to stop him.” <a href=\"http://nyti.ms/1ZTJn6E\">http://nyti.ms/1ZTJn6E</a></p> \n <p><b>VICE PRESIDENT BIDEN </b>and Dr. Jill Biden will be at tonight’s NCAA Final Four semifinal games in Houston to promote the It’s On Us campaign to end sexual assault on campus. The two will appear for a pregame interview on TBS. They return to D.C. on Sunday.</p> \n <p><b>--Other Washingtonians at the Final Four:</b> Jonathan Martin, who’s a birthday boy tomorrow, and Betsy Fischer Martin; John Feinstein; and Danielle and Jeff Jones.</p> \n <p><b>AP for SUNDAY PAPERS – “Clinton’s frustration grows, as primary race drags on,”</b> by Lisa Lerer in Syracuse and Ken Thomas in N.Y.: “Hillary Clinton snapped at a Greenpeace protester. She linked Bernie Sanders and tea party Republicans. And she bristled with anger when nearly two dozen Sanders supporters marched out of an event near her home outside New York City ... After a year of campaigning, months of debates and 35 primary elections, Sanders is finally getting under Clinton’s skin ... Clinton has spent weeks largely ignoring Sanders and trying to focus on ... Trump. Now, after several primary losses and with a tough fight in New York on the horizon, Clinton is showing flashes of frustration with the Vermont senator ...</p> \n <p><b>“According to Democrats close to Hillary</b> and former President Bill Clinton, both are frustrated by Sanders' ability to cast himself as above politics-as-usual even while firing off what they consider to be misleading attacks. The Clintons are even more annoyed that Sanders' approach seems to be rallying ... young voters by his side. While Hillary Clinton's team contends her lock on the nomination as ‘nearly insurmountable,’ the campaign frequently grumbles that Sanders hasn't faced the same level of scrutiny ... Her aides complain about Sanders' rhetoric, claiming he's broken his pledge to avoid character attacks ... </p> \n <p><b>“Clinton hopes that big victories</b> in New York on April 19 and five Northeastern states a week later will allow her to wrap up the nomination by the end of the month. But aides acknowledge that Sanders ... is unlikely to feel significant political or financial pressure to drop out of the race, even if it becomes clear he cannot win ... Sanders must win 67 percent of the remaining delegates and uncommitted superdelegates ... through June to be able to clinch the Democratic nomination. So far he's only winning 37 percent. </p> \n <p><b>“Joel Benenson,</b> Clinton’s chief strategist, said: ‘We’re going to get to a point at the end of April where there just isn’t enough real estate for him to overcome the lead that we’ve built.’ Still, any kind of truce is probably weeks, if not months, away. ... Sanders is costing Clinton significant time, money and political capital [and] is drawing sizable crowds in New York.” <a href=\"http://apne.ws/1pUb1Uo\">http://apne.ws/1pUb1Uo</a></p> \n <p><b>FRIENDS NOW CALLING SPICER “Mr. Chairman” ... “Backstage maneuvering begins in wide-open GOP chairman’s race,” by The Hill’s Scott Wong:</b> “Two ... RNC senior officials also have been mentioned as potential Priebus successors: John Ryder, the RNC’s general counsel, and Sean Spicer, the RNC’s chief strategist and communications director.<b> </b>... [and] the RNC’s top communicator since 2011.” <a href=\"http://bit.ly/1UyiRAk\">http://bit.ly/1UyiRAk</a> <br /> <b>--@seanspicer</b>: “Sunday add @GOP’s @Reince to list of ppl that have done ‘full Ginsburg’ @ThisWeekABC @meetthepress @FoxNewsSunday @FaceTheNation @CNNSotu”</p> \n <p><b>--TELLY LOVELACE</b> named RNC’s national director of African American initiatives and media -- Release: “Telly joins the RNC from IR+ Media ... where he served as Managing Director. Previously, Telly served as a senior member of Maryland Governor Larry Hogan’s communications team.”</p> \n <p><b>** A message from the Embassy of the United Arab Emirates:</b> The UAE stands with the US and President Obama in a shared commitment to stopping the proliferation of nuclear weapons. This is one of many areas where the UAE and US work together to strengthen stability and security in the Mideast and around the world. Learn more: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p> \n <p><b>PIC DU JOUR: </b>Colleagues of Jen Friedman in the White House press office played an April Fool’s joke on her yesterday by putting her birthday in Playbook. That led to dozens of happy birthday emails to her, including from senior staff, even though the deputy press secretary's real birthday is Nov. 7. They also decorated her office with a “Happy Birthday” banner and a balloon. <b><i>Pic of her decorated desk <a href=\"http://bit.ly/21XVFdO\">http://bit.ly/21XVFdO</a></i></b></p> \n <p><b>LIFE ONLINE – “Snapchat’s Ultimate Goal Isn’t Just Chat—It’s Total Media Domination,” by Fortune’s Mathew Ingram:</b> “The latest iteration came this week with the addition of new features including video calling, audio and video messaging, GIFs, and stickers. Unlike a lot of other messaging apps, all of the new features are blended together—users can seamlessly toggle between video and audio, send short notes, and draw on top of shared photos.” <a href=\"http://for.tn/1VjJ7gB\">http://for.tn/1VjJ7gB</a> </p> \n <p><b>TOMORROW’S TIMES TODAY -- “Navy SEALs Split Over Members’ Benefiting From Hard-Earned Brand,” by Nicholas Kulish, Christopher Drew, and Sean D. Naylor: </b>“[F]ormer members ... are increasingly giving paid speeches, sounding off on politics on Fox News and stamping the force’s name on hats, backpacks, vitamins ... [A] half-dozen books are scheduled to roll off the presses in coming months, adding to the 100-plus published by former SEALs since 2001.<b> </b>... Far more SEALs have gone public than their more reticent Army counterparts in Delta Force and the Rangers ...</p> \n <p><b>“One author, Matt Bissonnette, </b>earned millions for ‘No Easy Day,’ a firsthand narrative of the Bin Laden raid, but had to forfeit the profits for failing to submit it for Pentagon review of classified information.” <a href=\"http://nyti.ms/1RS6BqQ\">http://nyti.ms/1RS6BqQ</a></p> \n <p><b>WHITE HOUSE DEPARTURE LOUNGE:</b> Noah Schwartz left the National Security Council on Friday where he was advisor to the deputy national security advisor for int’l econ; he’s headed to the Office of the Secretary of Defense, where he’ll be working on South and Southeast Asia policy. He emails friends: “It has been a great privilege to serve on the NSC staff these past few years, and I will always be grateful for the experience. Special thanks to everyone (past and present) on the international economics team.”</p> \n <p><b>POLITICO MAGAZINE FRIDAY COVER – “9/11: What Would Trump Do?”:</b> “Politico Magazine asked foreign policy and counterterrorism experts, historians, Trump biographers, even psychologists to take a serious guess at how he’d handle the days after a terrorist attack in the United States—all based on what they know about Trump the candidate and what he’ll be facing if he gets elected.” <i>With hot takes from Jacob Heilbrunn, Ian Bremmer, Amb. Dennis Ross, Aaron David Miller, Andrew Bacevich and more</i> <a href=\"http://politi.co/1UJ0n0k\">http://politi.co/1UJ0n0k</a> </p> \n <p><b>STUFF TRUMP SAYS -- “How Donald Trump sees himself,” by CNN’s Scott Glover and Maeve Reston with a video by Brenna Williams:</b> “He considers himself a member of ‘the lucky sperm club.’<b> </b>He trusts no one, and places a premium on revenge. (‘If you do not get even, you are just a schmuck!’)<b> </b>He treats every decision he makes ‘like a lover,’ sometimes thinking with his head, other times with other parts of his body, because it reminds him to ‘keep in touch with my basic impulses.’<b> </b>And to make creative choices, he writes: ‘I try to step back and remember my first shallow reaction. The day I realized it can be smart to be shallow was, for me, a deep experience.’” <a href=\"http://cnn.it/1SGvhCL\">http://cnn.it/1SGvhCL</a> ... <i><b>2-min. video</b></i> <a href=\"http://cnn.it/1SsWF4w\">http://cnn.it/1SsWF4w</a> </p> \n <p><b>SUBJECT LINE DU JOUR – </b>Trump’s menacing campaign email sent Friday:<b> </b>“We’re Coming For You Wisconsin!” <i><b>Text of his email</b></i> <a href=\"http://bit.ly/25D0qOL\">http://bit.ly/25D0qOL</a> </p> \n <p><b>VIDEO DU JOUR – “Donald Trump’s Love/Hate Relationship With Women”</b> – Politico Magazine – <i><b>2-min. video </b></i><a href=\"http://politi.co/25Bwouu\">http://politi.co/25Bwouu</a><i><b> </b></i></p> \n <p><b>CLICKERS – “The nation’s cartoonists on the week in politics,” edited by Matt Wuerker – </b><i>11 keepers</i> <a href=\"http://politi.co/22WoUUd\">http://politi.co/22WoUUd</a> ... <i><b>Matt’s thirteen March cartoons</b></i> <a href=\"http://politi.co/1RRWsxo\">http://politi.co/1RRWsxo</a> </p> \n <p><b>GREAT WEEKEND READS,</b> curated by Daniel Lippman:</p> \n <p>--“<b>Jesus of Nazareth, Whose Messianic Message Captivated Thousands, Dies at About 33,” by Sam Roberts in Vanity Fair: </b>“Roberts, an obituary writer for The New York Times, imagines how, given the facts available then, his predecessors might have reported the aftermath of an execution in the Middle East one Friday two millennia ago.” <a href=\"http://bit.ly/1UZby4C\">http://bit.ly/1UZby4C</a> </p> \n <p>--<b>“The Men Who Gave Trump His Brutal Worldview,” by Michael D’Antonio, </b>author of “Never Enough: Donald Trump and the Pursuit of Success,” on Politico Magazine: “Tutored by his fiercely ambitious father and tough-as-nails high school coach, the GOP frontrunner has only one ethical code: life is combat.” <a href=\"http://politi.co/1LYiY5C\">http://politi.co/1LYiY5C</a><b><i> </i></b><i>...<b> $15.16 on Amazon</b></i> <a href=\"http://amzn.to/1g8Rlak\">http://amzn.to/1g8Rlak</a> </p> \n <p><b>--“This Professor Knows Why You Hate Ted Cruz’s Face” --Washingtonian Staff</b>: “(He read the other candidates’ faces, too.)” <a href=\"http://bit.ly/1M1RDiU\">http://bit.ly/1M1RDiU</a></p> \n <p><b>--“Emma Smith on The Best Plays of Shakespeare” – interviewed by Beatrice Wilford on FiveBooks.com:</b> “In the first of a series marking the 400th year since the playwright’s death, we ask Shakespearean scholar Emma Smith to pick her five favourite plays.” <a href=\"http://bit.ly/1MHUeyx\">http://bit.ly/1MHUeyx</a></p> \n <p><b>--“Murder in Mayfair,” by Peter Pomerantsev in The London Review of Books,</b> reviewing “A Very Expensive Poison: The Definitive Story of the Murder of Litvinenko and Russia’s War with the West,” by Luke Harding: “As he lay dying Alexander Litvinenko ... found it increasingly hard to open his mouth to talk, as he became yellow and shrivelled, he cursed himself for letting his guard down: he had assumed he was safe after receiving asylum and citizenship in the UK.” <a href=\"http://bit.ly/1MHUqhg\">http://bit.ly/1MHUqhg</a> ... <i><b>$12.93 on Amazon</b></i> <a href=\"http://amzn.to/1RS7lfM\">http://amzn.to/1RS7lfM</a> (h/t TheBrowser.com)</p> \n <p><b>--“The Strange Case of a Nazi Who Became an Israeli Hitman,” by Forward’s Dan Raviv and Yossi Melman in Haaretz:</b> “Otto Skorzeny, one of the Mossad’s most valuable assets, was a former lieutenant colonel in Nazi Germany’s Waffen-SS and one of Adolf Hitler’s favorites.” <a href=\"http://bit.ly/1RRUhds\">http://bit.ly/1RRUhds</a> </p> \n <p><b>--“How to Hack an Election,” by Jordan Robertson, Michael Riley, and Andrew Willis on Bloomberg Businessweek’s</b> international cover: “Andrés Sepúlveda rigged elections throughout Latin America for almost a decade. He tells his story for the first time.” <a href=\"http://bloom.bg/1RRUqxm\">http://bloom.bg/1RRUqxm</a> ... <i><b>The cover</b></i> <a href=\"http://bit.ly/1qkEUOn\">http://bit.ly/1qkEUOn</a></p> \n <p><b>--“How Meryl Streep Battled Dustin Hoffman, Retooled Her Role, and Won Her First Oscar,” by Michael Schulman </b>on the cover of April’s Vanity Fair, in an adaptation of his upcoming biography “Her Again: Becoming Meryl Streep” (out April 26):<b> </b>“At 29, Meryl Streep was grieving for a dead lover, falling for her future husband, and starting work on Kramer vs. Kramer, the movie that would make her a star and sweep the 1980 Oscars. ... Schulman recounts the struggles—physical, emotional, and intellectual—that launched Streep’s legend.” <a href=\"http://bit.ly/1RRUw8l\">http://bit.ly/1RRUw8l</a> ... <i><b>The cover </b></i><a href=\"http://bit.ly/1RuAL4x\">http://bit.ly/1RuAL4x</a> ... <i><b>$20.35 pre-order on Amazon </b></i><a href=\"http://amzn.to/1pUgx9P\">http://amzn.to/1pUgx9P</a> (h/t Longform.org)</p> \n <p><b>--“#Jihad: Why ISIS is winning the social media war,” by Brendan I. Koerner in Wired: </b>“The group’s closest peers are not just other terrorist organizations, then, but also the Western brands, marketing firms, and publishing outfits—from PepsiCo to BuzzFeed—who ply the Internet with memes and messages in the hopes of connecting with customers.” <a href=\"http://bit.ly/1Y4E0jF\">http://bit.ly/1Y4E0jF</a> ... <i><b>Video of</b></i> <i><b>Koerner on "CBS This Morning: Saturday” </b></i> <a href=\"http://cbsn.ws/1Y6qImE\">http://cbsn.ws/1Y6qImE</a> </p> \n <p><b>--“The Longform Guide to the Dark Side of Hollywood”: </b>8 pieces on “Corruption, venality, and tragedy: a collection of picks on what lies beneath the glitter.” <a href=\"http://bit.ly/1M7kPVN\">http://bit.ly/1M7kPVN</a><b> </b></p> \n <p><b>--“Crowd Source,” by Davy Rothbart in California Sunday Magazine</b>: “Inside the company that provides fake paparazzi, pretend campaign supporters, and counterfeit protesters.” <a href=\"http://bit.ly/1SGsWYC\">http://bit.ly/1SGsWYC</a> (h/t Longreads.com)</p> \n <p><b>MEDIAWATCH – “Andrew Sullivan joins New York Magazine,” by Hadas Gold:</b> “Sullivan is joining New York Magazine [as] a contributing editor, the publication’s editor-in-chief Adam Moss announced on Friday. Sullivan ... will write features throughout the year and cover politics, including the 2016 Democratic and Republican National Conventions. In a note on Facebook, Sullivan said his first piece is on Donald Trump. ... Sullivan’s standalone blog, The Dish, stopped publishing in February 2015, with Sullivan citing financial and personal difficulties. ... Prior to his career as a blogger, Sullivan was editor of The New Republic and a writer for New York Times Magazine.” <a href=\"http://politi.co/1RRWYvm\">http://politi.co/1RRWYvm</a> ... <i><b>His letter to “Dishheads” </b></i><a href=\"http://bit.ly/1Y4FDh8\">http://bit.ly/1Y4FDh8</a><i><b> </b></i></p> \n <p><b>FINAL FOUR --<i> </i>“‘One Shining Moment’ gets team-specific twist with Ne-Yo” –AP/Houston</b>: “‘One Shining Moment’ is getting a new Grammy-winning voice and some team-specific highlights at the end of the NCAA Final Four. The song has been the backdrop for the highlight piece to wrap up NCAA Tournaments for three decades, and the version by the late Luther Vandross will continue to be used for the national broadcast on TBS. For TNT and truTV, a rendition by three-time Grammy Award winner Ne-Yo will be used after the first-ever team-specific broadcasts of the national championship game. Ne-Yo’s performance will accompany team-centric highlights of the schools being featured in the Team Stream presentations, following their quest leading up to and during the title game.” <a href=\"http://yhoo.it/1qb6FsM\">http://yhoo.it/1qb6FsM</a> <i>...<b> Last year’s NCAA highlight piece</b></i> <a href=\"http://bit.ly/21ZpB9r\">http://bit.ly/21ZpB9r</a> ... <i><b>Charles Barkley’s 30-second rendition </b></i><a href=\"http://bit.ly/1pUoSKC\">http://bit.ly/1pUoSKC</a> </p> \n <p><b>BIRTHWEEK (was yesterday):</b> Noah Schwartz … Susan Pisano</p> \n <p><b>BIRTHDAYS:</b> Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ... </p> \n <p><b>... Joe Hack</b>, Sen. Deb. Fischer’s chief of staff and one of the youngest (if not the youngest) chiefs on the Senate side, is 29 (h/t colleague Brianna Puccini) ... Michael David Morgan, former deputy campaign manager for WA State Sen. Michael Baumgartner and current operations analyst for Optimus (h/t Cara Mathis) ... Sean Long, son of two former staffers, the pride of Clarendon, VA and an up and coming star at the DOJ, is 23 (h/t colleague John Sheehy) ... John McCauley, deputy comms. director at the Truman National Security Project ... Jim O’Grady, WNYC reporter and professional storyteller ... Nikhil Joshi, OFA and Obama WH alum now senior manager for biz ops and strategy at Lending Club in San Fran ... Emily Hartmann of Global Strategy Group ...</p> \n <p><b>... Danny Kanner, </b>now of NBA comms and alum of DGA and OFA ... Dan Reilly, NEA’s senior campaigns and elections specialist and a Teamsters alum … Margo McNabb … Christy Agner … Greg Boatright … Rep. Chellie Pingree (D-Me.) (h/ts Teresa Vilmain) … Abe Dyk … Kimberly Woodard … Sarah Fenn ... Kristina Clara Konrad-Williams ... Clare Osdene Schapiro ... Carole Chouinard ... Alex Rosenwald ... singer Emmylou Harris is 69 ... social critic and author Camille Paglia is 69 ... Jesse Carmichael (Maroon 5) is 37 ... Lee Dewyze (“American Idol”) is 30 ... Aaron Kelly (“American Idol”) is 23 (h/ts AP)</p> \n <p><b>MATT MACKOWIAK,</b> who for years has faithfully provided Playbookers with Sunday TV listings each weekend, is a candidate for the Man of the Year Campaign benefitting the Leukemia & Lymphoma Society (LLS), whose mission is to fund cutting-edge research and eradicate blood cancers. <i><b>Please thank Matt by checking out his fundraising page. </b></i><a href=\"http://bit.ly/1SuSlSs\">http://bit.ly/1SuSlSs</a> </p> \n <p><b>And here are THE SHOWS,</b> which @MattMackowiak filed from Austin:</p> \n <p>--<b>NBC’s “Meet the Press”</b>: Hillary Clinton ... Reince Priebus; Ron Johnson; roundtable: WTMJ-TV’s Charles Benson, David Brooks, Helene Cooper and Amy Walter</p> \n <p>--<b>ABC’s “This Week”</b>: John Kasich; Bernie Sanders; Reince Priebus; roundtable: Donna Brazile, Matthew Dowd, Hugh Hewitt and Juan Williams</p> \n <p>--<b>CBS’s “Face the Nation”</b>: Donald Trump; Reince Priebus; roundtable: Peggy Noonan, Ed O’Keefe, Mark Leibovich and Ruth Marcus; new results from the CBS News 2016 Battleground Tracker Poll from Wisconsin, Pennsylvania and New York with CBS News’ Anthony Salvanto</p> \n <p>--<b>“Fox News Sunday”</b>: Donald Trump; Reince Priebus; roundtable: George Will, Julie Pace, Stephen Dinan and Charles Lane; “Power Player of the Week” with John Ficklin, the 10<sup>th</sup> member of his family to work in the White House</p> \n <p>--<b>CNN’s “State of the Union” </b>(9am ET / 12pm ET): Reince Priebus; Bernie Sanders; roundtable: Bakari Sellers, Amanda Carpenter, Nina Turner and Andre Bauer</p> \n <p>-<b>-Univision’s “Al Punto”</b> (SUN 10am ET / 1pm PT) Jorge Casta<b>ñ</b>eda; Venezuelan opposition leader and wife of political prisoner (Leopoldo L<b>ó</b>pez) Lilian Tintori; Republican political analysts Helen Aguirre and Adryana Boyne; FARC leader Rodrigo Londo<b>ñ</b>o Echeverri “Timochenko”; Peruvian presidential candidate (Peruvians for Change) Pedro Pablo Kuczynski; Peruvian presidential candidate (Popular Action) Alfredo Barnechea (substitute anchor: Enrique Acevedo)</p> \n <p>--<b>CNN’s “Inside Politics” with John King </b>(SUN 8am ET): Roundtable: Jackie Kucinich, Ed O’Keefe, Jeff Zeleny and Abby Phillip</p> \n <p>--<b>Fox News’ “Sunday Morning Futures” </b>(10am ET / 9am CT): Peter King; Allianz chief economic adviser Mohamed El-Erian; actor Scott Baio; roundtable: WSJ’s Jon Hilsenrath, Ed Rollins and Scott Brown</p> \n <p>--<b>CNN’s “Reliable Sources”</b>: (SUN 11am ET): Author Ariana Huffington (“The Sleep Revolution”); Connie Chung and Maury Povich; Matthew Dowd; former People Magazine managing editor Larry Hackett</p> \n <p>--<b>Fox News’ “MediaBuzz” </b>(SUN 11am ET / 10am CT): Trump campaign spokesperson Katrina Pierson; Andrea Tantaros; Meghan McCain; Kennedy; Sandra Smith; Susan Ferrechio; Amy Holmes; Kirsten Powers; tech analyst Shana Glenzer</p> \n <p>--<b>CNN’s “Fareed Zakaria GPS”</b>: (SUN 11am ET): National security roundtable: RAND’s Seth Jones, NYPD’s John Miller and former CIA and FBI counterterrorism analyst Phil Mudd; roundtable: E.J. Dionne, author and Republican Main Street Partnership’s Geoffrey Kabaservice (“The Downfall of Moderation and the Destruction of the Republican Party”), author and Princeton University’s Nell Irvin Painter (“Southern History Across the Color Line”) and David Remnick</p> \n <p><b>--C-SPAN</b>: <b>“The Communicators”</b> (SAT 6:30pm ET): American Cable Association president & CEO Matthew Polka and American Cable Association board chair Robert Gessner ... <b>“Newsmakers”</b> (SUN 10am ET): Club for Growth president David McIntosh, questioned by Washington Examiner’s Gabby Morrongiello and Politico’s James Hohmann ... <b>“Q&A”</b> (SUN 8pm & 11pm ET): Discussion with high school students attending the U.S. Senate Youth Program</p> \n <p>--<b>MSNBC’s “PoliticsNation with Rev. Al Sharpton”</b>: (SUN 8-9am ET): Ben Carson; Clinton campaign national political director Amanda Renteria; Reuters’ Erin McPike; Amy Holmes; author Michael D’Antonio (“Never Enough”); One Wisconsin Now executive director Scot Ross</p> \n <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 9-10am ET): Capital Times’ Jessie Opoien; The Boston Globe’s James Pindell; Trump campaign senior advisor Barry Bennett; Politico’s Gabe Debenedetti (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p> \n <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 10am-12pm ET): MSNBC’s Irin Carmon; Clinton campaign national spokeswoman Karen Finney; Michael Steele; Public Religion Research Institute’s Robert Jones; Michael Eric Dyson; Nina Turner; John Dean; Wisconsin-based radio host Charlie Sykes; The Nation Magazine’s John Nichols; One Wisconsin Now’s Scot Ross (hosted by MSNBC’s Joy Reid live from New York)</p> \n <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 12-2pm ET): Wisconsin-based radio host Charlie Sykes; Elise Jordan; Howard Dean; Kasich supporter Jack Voight; USA Today’s Paul Singer; RCP’s Rebecca Berg; State Rep. Melissa Sargent (D-WI); former Romney and Rubio advisor Avik Roy; author and Milwaukee Journal-Sentinel’s Jason Stein (“Dropping the Bomb: Scott Walker, Unions and the Fight for a State”) (hosted by MSNBC’s Alex Witt live from New York)</p> \n <p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 3-4pm ET): The University of Wisconsin’s Mordecai Lee; former Wisconsin Lt. Gov. Barbara Lawton; WaPo’s Robert Costa; Milwaukee Mayor Tom Barrett (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p> \n <p>--<b>PBS’s “To the Contrary” with Bonnie Erbé</b>: Roundtable: Eleanor Holmes Norton; former Judge and federal prosecutor Debra Carnahan; Washington Examiner columnist Ashe Schow; Republican strategist Rina Shah Bharara and Concerned Women for America’s Penny Nance</p> \n <p>--<b>SiriusXM’s “No Labels Radio” </b>(SAT 10am ET & 6pm ET, SUN 1PM ET): Host Jon Huntsman, with co-hosts No Labels co-founder Stuart Holliday and Politico’s Daniel Lippman, will discuss the Puerto Rico debt crisis with Puerto Rico’s lead restructuring advisor Jim Millstein, the 2016 election cycle with pollster Frank Luntz and the Wisconsin primary with Ron Johnson. <i><b>Pic of the hosts this weekend</b></i> <a href=\"http://bit.ly/1qdGxOa\">http://bit.ly/1qdGxOa</a> </p> \n <p>--<b>Sinclair’s “Full Measure” with Sharyl Attkisson</b> (SUN 9:30am ET on WJLA and airing on Sinclair stations nationwide): the dangerous underground world of smuggling, of both drugs and humans; correspondent Scott Thuman examines how Morocco, one of the most beautiful countries on Earth, is being associated with the most recent terrorist attacks. </p> \n <p><b>** A message from the Embassy of the United Arab Emirates:</b> Together, the UAE and US are working to make the Middle East more secure and prevent the proliferation of nuclear weapons.</p> \n <p>The UAE is the only country in the Arabian Gulf to have a civilian nuclear energy accord (also known as a 123 Agreement) with the US. Called the “gold standard” by nonproliferation experts and government leaders, the UAE has boldly pledged not to enrich uranium or extract plutonium.</p> \n <p>Next year, the UAE’s first nuclear energy station will begin generating safe, clean electricity. A leader in developing new technologies for renewable and sustainable energy, the UAE is also home to the International Renewable Energy Agency and the Masdar Institute.</p> \n <p>Learn about how the UAE and US are united for a better future: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p> \n <p><b>SUBSCRIBE</b> to the Playbook family: <b>POLITICO Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331\">http://politi.co/1M75UbX</a> ... <b>New York Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba\">http://politi.co/1ON8bqW</a> ... <b>Florida Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0\">http://politi.co/1JDm23W</a> ... <b>New Jersey Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337\">http://politi.co/1HLKltF</a> ...<b> Massachusetts Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453\">http://politi.co/1Nhtq5v</a> ... <b>Illinois Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc\">http://politi.co/1N7u5sb</a> ... <b>California Playbook</b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543\">http://politi.co/1N8zdJU</a> ... <b>Brussels Playbook </b> <a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef\">http://politi.co/1FZeLcw</a></p>\n <div> \n <div> \n <a href=\"http://www.politico.com/tipsheets/playbook/archive\">« View Archives</a>\n </div> \n </div> \n <div> \n <div> \n <div> \n <div> \n </div> \n </div> \n <aside> \n <div> \n <h2></h2> \n <div> \n <a href=\"http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487\"><img width=\"4047\" height=\"2194\" /></a>\n </div> \n <div> \n <ol> \n <li> \n <article> \n <div> \n <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487\">Trump campaign shrinks Lewandowski's role</a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-tennessee-gop-delegates-221489\">Tennessee GOP delegate fight erupts ahead of party meeting</a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3><a href=\"http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484\">Sanders gets under Clinton's skin in New York</a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3><a href=\"http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480\">How the North Dakota GOP is freezing out Trump</a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3><a href=\"http://www.politico.com/story/2016/04/hillary-clinton-fbi-strategy-emails-221435\">Clinton aides unite on FBI legal strategy</a></h3> \n </div> \n </article> </li> \n </ol> \n </div> \n </div> \n </aside> \n <aside> \n <div> \n <h2><a href=\"http://www.politico.com/tipsheets/playbook/archive\">Playbook - POLITICO Archive</a></h2> \n <div> \n <ul> \n <li> \n <article> \n <div> \n <h3> <a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\">Saturday, 4/2/16 </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3> <a href=\"http://www.politico.com/playbook/2016/04/trump-would-be-most-unpopular-major-party-nominee-in-32-years-playbook-breakfast-with-white-house-counsel-neil-eggleston-and-senior-adviser-brian-deese-livestreams-7-57-am-213524\">Friday, 4/1/16 </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3> <a href=\"http://www.politico.com/playbook/2016/03/how-will-bill-take-trump-attacks-on-hillary-playbook-breakfast-on-the-court-tomorrow-white-house-counsel-neil-eggleston-senior-adviser-brian-deese-remembering-jennifer-frey-213503\">Thursday, 3/31/16 </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3> <a href=\"http://www.politico.com/playbook/2016/03/trump-to-people-im-the-highest-level-of-smart-4-hours-of-sleep-on-working-out-dont-have-to-when-youre-making-america-great-again-you-get-a-lot-of-exercise-213479\">Wednesday, 3/30/16 </a></h3> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <h3> <a href=\"http://www.politico.com/playbook/2016/03/great-mentioner-trumps-cabinet-jill-abramsons-new-column-david-gregorys-tv-gig-bday-peter-cherukuri-paul-farhi-robert-gibbs-steve-peoples-roger-simon-peter-velz-213455\">Tuesday, 3/29/16 </a></h3> \n </div> \n </article> </li> \n </ul> \n </div> \n <ul> \n <li> <a href=\"http://www.politico.com/tipsheets/playbook/archive\">View the Full Playbook Archives »</a></li> \n </ul> \n </div> \n </aside> \n <aside> \n <div> \n <h2> <b></b> <b></b> Politico Magazine </h2> \n <div> \n <ul> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\"><img alt=\"Donald Trump and Martha Stewart "fire" up NBC's promo campaign for both "Apprentice" shows. The two icons came together to shoot multiple spots for NBC. The promos, which began airing on Aug. 9, show off their lighter sides.\" /></a>\n </div> \n <div> \n <h3><a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\">The Next Donald Trumps</a></h3> \n <p> By Luke O'Neil</p> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783\"><img alt=\"160331_CPI_defense_fighter_gty.jpg\" /></a>\n </div> \n <div> \n <h3><a href=\"http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783\">The Defense Industry’s Surprising 2016 Favorites: Bernie & Hillary</a></h3> \n <p> By Alexander Cohen</p> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753\"><img alt=\"160321_sosnik_ap.jpg\" /></a>\n </div> \n <div> \n <h3><a href=\"http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753\">Here’s How You Know 2016 Is Already Decided</a></h3> \n <p> By Doug Sosnik</p> \n </div> \n </article> </li> \n <li> \n <article> \n <div> \n <a href=\"http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785\"><img alt=\"160401_ridge_getty.jpg\" /></a>\n </div> \n <div> \n <h3><a href=\"http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785\">What Trump and Cruz’s Clueless Muslim Rhetoric Will Cost America</a></h3> \n <p> By Tom Ridge</p> \n </div> \n </article> </li> \n </ul> \n </div> \n </div> \n </aside> \n <div> \n <div> \n </div> \n </div> \n </div> \n </div> \n </div> \n </div> \n </section> \n </article> \n </div> \n</div>",
"main_length" : 43943,
"main_checksum" : "wuwwljNDSjPAjO8RpuUDy5yCmxI",
"main_format" : "HTML",
"extract" : "<a href=\"http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545\">BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</a><p>Inside the campaigns and Corey Lewandowski's shrinking role.</p>04/02/16 01:05 PM EDT<p>The camps are quarreling over when the debate should take place.</p>04/02/16 12:57 PM EDT<a href=\"http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786\">The Next Donald Trumps</a><p> By Luke O'Neil</p><p>7 celebrities who’ve got what it takes to follow in the brazen billionaire’s footsteps in 2020.</p>04/02/16 07:56 AM EDT<p>Despite billionaire's staunch defense, embattled campaign manager is losing clout.</p><p> Updated </p><p>The skirmish is the latest in the increasingly fierce battle for delegates to the Republican National Convention in Cleveland.</p>04/01/16 11:30 PM EDT<p>The Republican front-runner's suggestion to blow up decades of non-proliferation policy provokes the president.</p><p> Updated </p><p>Obama also urged Iran not to violate the spirit of the nuclear deal, even if it is technically abiding by it.</p>04/01/16 07:32 PM EDT<p>The real estate mogul also said he agrees with the notion that abortion is murder.</p>04/01/16 07:20 PM EDT<p>He's making an aggressive push for Jewish support, casting himself as a steadfast supporter of Israel.</p><p> Updated </p><p>“I think she probably owes the senator an apology for that because the senator is not lying about her record.”</p>04/01/16 06:48 PM EDT<p>Republican candidates are fighting for 25 delegates doled out during one chaotic weekend, but state party rules put outsiders at a steep disadvantage.</p>04/01/16 05:47 PM EDT<p>"I can’t even imagine what’s in those emails. But I’m sure I would probably be mortified."</p>04/01/16 04:51 PM EDT<a href=\"http://www.politico.com/tipsheets/the-2016-blast/2016/04/cruzs-evangelical-problem-democrats-lying-spat-clinton-aides-all-four-one-kasich-doesnt-like-lyin-213544\">Cruz’s evangelical problem </a><p> By Henry C. Jackson</p><p>Unless Cruz can quickly make inroads with non-evangelical voters who so far have mostly rejected him, he has little chance of stopping Trump.</p>04/01/16 04:22 PM EDT<p>Amid the fanfare over President Barack Obama’s visit to Havana, U.S. officials and executives from major food companies are eyeing the island as a potential...</p>04/01/16 04:17 PM EDT<p>U.S. manufacturers have something to say to Donald Trump, Bernie Sanders and Hillary Clinton: Stop trying to “protect” us with slams on free trade.</p>04/01/16 04:12 PM EDT<p>The business group said a recession would set in during the first year under the Republican front-runner's proposed tariffs because China and Mexico would...</p>04/01/16 02:31 PM EDT<p>The Texan surged to second place as the candidate of Christian conservatives. His challenge now is how few are left to vote.</p>04/01/16 02:00 PM EDT<p>NCEC returns — GM's Bhuta to Bockorny</p>04/01/16 01:33 PM EDT<p>"And no, this is not an April Fools' [joke]," he wrote on Facebook.</p><p> Updated </p><p>"There isn’t anybody except one candidate who has a higher favorable than unfavorable rating."</p>04/01/16 01:23 PM EDT<p>Obama has commuted the sentences of 248 federal prisoners, mostly low-level drug offenders affected by mandatory minimum drug sentences, including 61 on...</p>04/01/16 12:30 PM EDT<p>The Republican front-runner risks an exodus of delegates if he fails to clinch the nomination outright.</p>04/01/16 11:07 AM EDT<p>Donald Trump's rocky week is likely to force his longtime backers, especially women, to reconsider their support of him, former Texas Gov. Rick Perry said...</p>04/01/16 10:59 AM EDT<p>"We haven't talked about any specific positions. What we have talked about is the fact that we both have a strong desire to heal this country."</p>04/01/16 10:56 AM EDT<p>"Garland was a member of the panel at the time the case was argued but did not participate in this opinion," the opinion says in a footnote.</p>04/01/16 10:53 AM EDT<p>Obama’s personal appeal will come as Garland continues to meet with senators individually.</p>04/01/16 10:48 AM EDT<p>Hillary Clinton owes Bernie Sanders' campaign an apology, the campaign said Friday.</p>04/01/16 10:00 AM EDT<p>What we learned about opioids yesterday — More bad news for Theranos</p>04/01/16 10:00 AM EDT<p>Bayer seeks ALJ review of EPA pesticide cancellation — Small number of dairy producers to get payments under MPP</p>04/01/16 10:00 AM EDT<p>Tesla’s tax benefits — Trump and Cruz find agreement (on the carbon tax)</p><h4>You're All Caught Up</h4><p>We're working on more stories right now</p><h5>Check out today's hot topics</h5><p>Mike Allen's must-read briefing on what's driving the day in Washington </p>Subscribe<h1>BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster</h1>04/02/16 01:32 PM EDT<b>By Mike Allen </b><p>(@mikeallen; <a href=\"mailto:mallen@politico.com\">mallen@politico.com</a>)<b> and Daniel Lippman </b>(@dlippman; <a href=\"mailto:dlippman@politico.com\">dlippman@politico.com</a>)</p><b>Happy Saturday! </b><p>Obama alumnus Brent Colburn asked that we include this in celebration of his birthday: Today “is World Autism Awareness Day, and in celebration of my nephew Cordis and his amazing parents, Brian & Andrea Colburn, I’d like to encourage every Playbooker to take a minute and learn more about ... the autism community.” <a href=\"http://www.autismspeaks.org\">www.autismspeaks.org</a></p><p>Story Continued Below</p><b>INSIDE THE CAMPAIGNS – “Trump campaign shrinks Lewandowski’s role: </b><p>Despite the billionaire’s staunch defense, his embattled campaign manager is losing clout,” by Ben Schreckinger and Ken Vogel, with Hadas Gold: “Trump’s just-named convention manager, Paul Manafort, is expected to take a leading role not just in the selection of delegates, but in the remaining primaries themselves. ... [A] person involved in Trump’s campaign [said:] ... ‘Mr. Trump’s listening to other people now. The crew’s expanding.’ ... </p><b>“[T]his winter,</b><p> ... National Political Director Michael Glassner ... was [promoted] to deputy campaign manager ... On March 2, the campaign promoted Stuart Jolly ... to national field director, giving him primary authority over ... hiring ... field staff. ... <b>Manafort has quickly taken charge of his own fiefdom in Washington, and is planning to hire a team of his own,</b> which is likely to include several veterans of the 1976 Republican National Convention – the party’s last convention at which the presidential nomination was contested.” <a href=\"http://politi.co/22YORlU\">http://politi.co/22YORlU</a></p><b>**SUBSCRIBE to Playbook</b><p>: <a href=\"http://politi.co/1M75UbX\">http://politi.co/1M75UbX</a></p><b>CHASER – “Trump touts his loyalty in defending campaign manager</b><p>,” by AP’s Jill Colvin in Appleton, Wis.: “Trump [said in a phoner Thu. evening that] his decision to stand behind his campaign manager ... is a sign of loyalty — a trait that Trump has displayed, for better or worse, through much of his career.” <a href=\"http://apne.ws/1RS7xLX\">http://apne.ws/1RS7xLX</a></p><b>ALEX BURNS, who turns 3-0 tomorrow, </b><p>coins a memorable phrase on N.Y. Times p. A9, “G.O.P. Fears Trump as <b>Zombie Candidate: Damaged but Unstoppable”:</b> “Republicans who once worried that Mr. Trump might gain overwhelming momentum ... are now becoming preoccupied with a different grim prospect: that Mr. Trump might become a kind of zombie candidate — damaged beyond the point of repair, but too late for any of his rivals to stop him.” <a href=\"http://nyti.ms/1ZTJn6E\">http://nyti.ms/1ZTJn6E</a></p><b>VICE PRESIDENT BIDEN </b><p>and Dr. Jill Biden will be at tonight’s NCAA Final Four semifinal games in Houston to promote the It’s On Us campaign to end sexual assault on campus. The two will appear for a pregame interview on TBS. They return to D.C. on Sunday.</p><b>--Other Washingtonians at the Final Four:</b><p> Jonathan Martin, who’s a birthday boy tomorrow, and Betsy Fischer Martin; John Feinstein; and Danielle and Jeff Jones.</p><b>AP for SUNDAY PAPERS – “Clinton’s frustration grows, as primary race drags on,”</b><p> by Lisa Lerer in Syracuse and Ken Thomas in N.Y.: “Hillary Clinton snapped at a Greenpeace protester. She linked Bernie Sanders and tea party Republicans. And she bristled with anger when nearly two dozen Sanders supporters marched out of an event near her home outside New York City ... After a year of campaigning, months of debates and 35 primary elections, Sanders is finally getting under Clinton’s skin ... Clinton has spent weeks largely ignoring Sanders and trying to focus on ... Trump. Now, after several primary losses and with a tough fight in New York on the horizon, Clinton is showing flashes of frustration with the Vermont senator ...</p><b>“According to Democrats close to Hillary</b><p> and former President Bill Clinton, both are frustrated by Sanders' ability to cast himself as above politics-as-usual even while firing off what they consider to be misleading attacks. The Clintons are even more annoyed that Sanders' approach seems to be rallying ... young voters by his side. While Hillary Clinton's team contends her lock on the nomination as ‘nearly insurmountable,’ the campaign frequently grumbles that Sanders hasn't faced the same level of scrutiny ... Her aides complain about Sanders' rhetoric, claiming he's broken his pledge to avoid character attacks ... </p><b>“Clinton hopes that big victories</b><p> in New York on April 19 and five Northeastern states a week later will allow her to wrap up the nomination by the end of the month. But aides acknowledge that Sanders ... is unlikely to feel significant political or financial pressure to drop out of the race, even if it becomes clear he cannot win ... Sanders must win 67 percent of the remaining delegates and uncommitted superdelegates ... through June to be able to clinch the Democratic nomination. So far he's only winning 37 percent. </p><b>“Joel Benenson,</b><p> Clinton’s chief strategist, said: ‘We’re going to get to a point at the end of April where there just isn’t enough real estate for him to overcome the lead that we’ve built.’ Still, any kind of truce is probably weeks, if not months, away. ... Sanders is costing Clinton significant time, money and political capital [and] is drawing sizable crowds in New York.” <a href=\"http://apne.ws/1pUb1Uo\">http://apne.ws/1pUb1Uo</a></p><b>FRIENDS NOW CALLING SPICER “Mr. Chairman” ... “Backstage maneuvering begins in wide-open GOP chairman’s race,” by The Hill’s Scott Wong:</b><p> “Two ... RNC senior officials also have been mentioned as potential Priebus successors: John Ryder, the RNC’s general counsel, and Sean Spicer, the RNC’s chief strategist and communications director.... [and] the RNC’s top communicator since 2011.” <a href=\"http://bit.ly/1UyiRAk\">http://bit.ly/1UyiRAk</a><b>--@seanspicer</b>: “Sunday add @GOP’s @Reince to list of ppl that have done ‘full Ginsburg’ @ThisWeekABC @meetthepress @FoxNewsSunday @FaceTheNation @CNNSotu”</p><b>--TELLY LOVELACE</b><p> named RNC’s national director of African American initiatives and media -- Release: “Telly joins the RNC from IR+ Media ... where he served as Managing Director. Previously, Telly served as a senior member of Maryland Governor Larry Hogan’s communications team.”</p><b>** A message from the Embassy of the United Arab Emirates:</b><p> The UAE stands with the US and President Obama in a shared commitment to stopping the proliferation of nuclear weapons. This is one of many areas where the UAE and US work together to strengthen stability and security in the Mideast and around the world. Learn more: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p><b>PIC DU JOUR: </b><p>Colleagues of Jen Friedman in the White House press office played an April Fool’s joke on her yesterday by putting her birthday in Playbook. That led to dozens of happy birthday emails to her, including from senior staff, even though the deputy press secretary's real birthday is Nov. 7. They also decorated her office with a “Happy Birthday” banner and a balloon. </p><i>Pic of her decorated desk <a href=\"http://bit.ly/21XVFdO\">http://bit.ly/21XVFdO</a></i><b>LIFE ONLINE – “Snapchat’s Ultimate Goal Isn’t Just Chat—It’s Total Media Domination,” by Fortune’s Mathew Ingram:</b><p> “The latest iteration came this week with the addition of new features including video calling, audio and video messaging, GIFs, and stickers. Unlike a lot of other messaging apps, all of the new features are blended together—users can seamlessly toggle between video and audio, send short notes, and draw on top of shared photos.” <a href=\"http://for.tn/1VjJ7gB\">http://for.tn/1VjJ7gB</a></p><b>TOMORROW’S TIMES TODAY -- “Navy SEALs Split Over Members’ Benefiting From Hard-Earned Brand,” by Nicholas Kulish, Christopher Drew, and Sean D. Naylor: </b><p>“[F]ormer members ... are increasingly giving paid speeches, sounding off on politics on Fox News and stamping the force’s name on hats, backpacks, vitamins ... [A] half-dozen books are scheduled to roll off the presses in coming months, adding to the 100-plus published by former SEALs since 2001.... Far more SEALs have gone public than their more reticent Army counterparts in Delta Force and the Rangers ...</p><b>“One author, Matt Bissonnette, </b><p>earned millions for ‘No Easy Day,’ a firsthand narrative of the Bin Laden raid, but had to forfeit the profits for failing to submit it for Pentagon review of classified information.” <a href=\"http://nyti.ms/1RS6BqQ\">http://nyti.ms/1RS6BqQ</a></p><b>WHITE HOUSE DEPARTURE LOUNGE:</b><p> Noah Schwartz left the National Security Council on Friday where he was advisor to the deputy national security advisor for int’l econ; he’s headed to the Office of the Secretary of Defense, where he’ll be working on South and Southeast Asia policy. He emails friends: “It has been a great privilege to serve on the NSC staff these past few years, and I will always be grateful for the experience. Special thanks to everyone (past and present) on the international economics team.”</p><b>POLITICO MAGAZINE FRIDAY COVER – “9/11: What Would Trump Do?”:</b><p> “Politico Magazine asked foreign policy and counterterrorism experts, historians, Trump biographers, even psychologists to take a serious guess at how he’d handle the days after a terrorist attack in the United States—all based on what they know about Trump the candidate and what he’ll be facing if he gets elected.” <i>With hot takes from Jacob Heilbrunn, Ian Bremmer, Amb. Dennis Ross, Aaron David Miller, Andrew Bacevich and more</i><a href=\"http://politi.co/1UJ0n0k\">http://politi.co/1UJ0n0k</a></p><b>STUFF TRUMP SAYS -- “How Donald Trump sees himself,” by CNN’s Scott Glover and Maeve Reston with a video by Brenna Williams:</b><p> “He considers himself a member of ‘the lucky sperm club.’He trusts no one, and places a premium on revenge. (‘If you do not get even, you are just a schmuck!’)He treats every decision he makes ‘like a lover,’ sometimes thinking with his head, other times with other parts of his body, because it reminds him to ‘keep in touch with my basic impulses.’And to make creative choices, he writes: ‘I try to step back and remember my first shallow reaction. The day I realized it can be smart to be shallow was, for me, a deep experience.’” <a href=\"http://cnn.it/1SGvhCL\">http://cnn.it/1SGvhCL</a> ... <a href=\"http://cnn.it/1SsWF4w\">http://cnn.it/1SsWF4w</a></p><b>2-min. video</b><b>SUBJECT LINE DU JOUR – </b><p>Trump’s menacing campaign email sent Friday:“We’re Coming For You Wisconsin!” <a href=\"http://bit.ly/25D0qOL\">http://bit.ly/25D0qOL</a></p><b>Text of his email</b><b>VIDEO DU JOUR – “Donald Trump’s Love/Hate Relationship With Women”</b><p> – Politico Magazine – <a href=\"http://politi.co/25Bwouu\">http://politi.co/25Bwouu</a></p><b>2-min. video </b><b>CLICKERS – “The nation’s cartoonists on the week in politics,” edited by Matt Wuerker – </b><i>11 keepers</i><a href=\"http://politi.co/22WoUUd\">http://politi.co/22WoUUd</a><p> ... <a href=\"http://politi.co/1RRWsxo\">http://politi.co/1RRWsxo</a></p><b>Matt’s thirteen March cartoons</b><b>GREAT WEEKEND READS,</b><p> curated by Daniel Lippman:</p><p>--“<b>Jesus of Nazareth, Whose Messianic Message Captivated Thousands, Dies at About 33,” by Sam Roberts in Vanity Fair: </b>“Roberts, an obituary writer for The New York Times, imagines how, given the facts available then, his predecessors might have reported the aftermath of an execution in the Middle East one Friday two millennia ago.” <a href=\"http://bit.ly/1UZby4C\">http://bit.ly/1UZby4C</a></p><p>--<b>“The Men Who Gave Trump His Brutal Worldview,” by Michael D’Antonio, </b>author of “Never Enough: Donald Trump and the Pursuit of Success,” on Politico Magazine: “Tutored by his fiercely ambitious father and tough-as-nails high school coach, the GOP frontrunner has only one ethical code: life is combat.” <a href=\"http://politi.co/1LYiY5C\">http://politi.co/1LYiY5C</a><i>...<b> $15.16 on Amazon</b></i><a href=\"http://amzn.to/1g8Rlak\">http://amzn.to/1g8Rlak</a></p><b>--“This Professor Knows Why You Hate Ted Cruz’s Face” --Washingtonian Staff</b><p>: “(He read the other candidates’ faces, too.)” <a href=\"http://bit.ly/1M1RDiU\">http://bit.ly/1M1RDiU</a></p><b>--“Emma Smith on The Best Plays of Shakespeare” – interviewed by Beatrice Wilford on FiveBooks.com:</b><p> “In the first of a series marking the 400th year since the playwright’s death, we ask Shakespearean scholar Emma Smith to pick her five favourite plays.” <a href=\"http://bit.ly/1MHUeyx\">http://bit.ly/1MHUeyx</a></p><b>--“Murder in Mayfair,” by Peter Pomerantsev in The London Review of Books,</b><p> reviewing “A Very Expensive Poison: The Definitive Story of the Murder of Litvinenko and Russia’s War with the West,” by Luke Harding: “As he lay dying Alexander Litvinenko ... found it increasingly hard to open his mouth to talk, as he became yellow and shrivelled, he cursed himself for letting his guard down: he had assumed he was safe after receiving asylum and citizenship in the UK.” <a href=\"http://bit.ly/1MHUqhg\">http://bit.ly/1MHUqhg</a> ... <a href=\"http://amzn.to/1RS7lfM\">http://amzn.to/1RS7lfM</a> (h/t TheBrowser.com)</p><b>$12.93 on Amazon</b><b>--“The Strange Case of a Nazi Who Became an Israeli Hitman,” by Forward’s Dan Raviv and Yossi Melman in Haaretz:</b><p> “Otto Skorzeny, one of the Mossad’s most valuable assets, was a former lieutenant colonel in Nazi Germany’s Waffen-SS and one of Adolf Hitler’s favorites.” <a href=\"http://bit.ly/1RRUhds\">http://bit.ly/1RRUhds</a></p><b>--“How to Hack an Election,” by Jordan Robertson, Michael Riley, and Andrew Willis on Bloomberg Businessweek’s</b><p> international cover: “Andrés Sepúlveda rigged elections throughout Latin America for almost a decade. He tells his story for the first time.” <a href=\"http://bloom.bg/1RRUqxm\">http://bloom.bg/1RRUqxm</a> ... <a href=\"http://bit.ly/1qkEUOn\">http://bit.ly/1qkEUOn</a></p><b>The cover</b><b>--“How Meryl Streep Battled Dustin Hoffman, Retooled Her Role, and Won Her First Oscar,” by Michael Schulman </b><p>on the cover of April’s Vanity Fair, in an adaptation of his upcoming biography “Her Again: Becoming Meryl Streep” (out April 26):“At 29, Meryl Streep was grieving for a dead lover, falling for her future husband, and starting work on Kramer vs. Kramer, the movie that would make her a star and sweep the 1980 Oscars. ... Schulman recounts the struggles—physical, emotional, and intellectual—that launched Streep’s legend.” <a href=\"http://bit.ly/1RRUw8l\">http://bit.ly/1RRUw8l</a> ... <a href=\"http://bit.ly/1RuAL4x\">http://bit.ly/1RuAL4x</a> ... <a href=\"http://amzn.to/1pUgx9P\">http://amzn.to/1pUgx9P</a> (h/t Longform.org)</p><b>The cover </b><b>$20.35 pre-order on Amazon </b><b>--“#Jihad: Why ISIS is winning the social media war,” by Brendan I. Koerner in Wired: </b><p>“The group’s closest peers are not just other terrorist organizations, then, but also the Western brands, marketing firms, and publishing outfits—from PepsiCo to BuzzFeed—who ply the Internet with memes and messages in the hopes of connecting with customers.” <a href=\"http://bit.ly/1Y4E0jF\">http://bit.ly/1Y4E0jF</a> ... <a href=\"http://cbsn.ws/1Y6qImE\">http://cbsn.ws/1Y6qImE</a></p><b>Video of</b><b>Koerner on "CBS This Morning: Saturday” </b><b>--“The Longform Guide to the Dark Side of Hollywood”: </b><p>8 pieces on “Corruption, venality, and tragedy: a collection of picks on what lies beneath the glitter.” <a href=\"http://bit.ly/1M7kPVN\">http://bit.ly/1M7kPVN</a></p><b>--“Crowd Source,” by Davy Rothbart in California Sunday Magazine</b><p>: “Inside the company that provides fake paparazzi, pretend campaign supporters, and counterfeit protesters.” <a href=\"http://bit.ly/1SGsWYC\">http://bit.ly/1SGsWYC</a> (h/t Longreads.com)</p><b>MEDIAWATCH – “Andrew Sullivan joins New York Magazine,” by Hadas Gold:</b><p> “Sullivan is joining New York Magazine [as] a contributing editor, the publication’s editor-in-chief Adam Moss announced on Friday. Sullivan ... will write features throughout the year and cover politics, including the 2016 Democratic and Republican National Conventions. In a note on Facebook, Sullivan said his first piece is on Donald Trump. ... Sullivan’s standalone blog, The Dish, stopped publishing in February 2015, with Sullivan citing financial and personal difficulties. ... Prior to his career as a blogger, Sullivan was editor of The New Republic and a writer for New York Times Magazine.” <a href=\"http://politi.co/1RRWYvm\">http://politi.co/1RRWYvm</a> ... <a href=\"http://bit.ly/1Y4FDh8\">http://bit.ly/1Y4FDh8</a></p><b>His letter to “Dishheads” </b><b>FINAL FOUR --“‘One Shining Moment’ gets team-specific twist with Ne-Yo” –AP/Houston</b><p>: “‘One Shining Moment’ is getting a new Grammy-winning voice and some team-specific highlights at the end of the NCAA Final Four. The song has been the backdrop for the highlight piece to wrap up NCAA Tournaments for three decades, and the version by the late Luther Vandross will continue to be used for the national broadcast on TBS. For TNT and truTV, a rendition by three-time Grammy Award winner Ne-Yo will be used after the first-ever team-specific broadcasts of the national championship game. Ne-Yo’s performance will accompany team-centric highlights of the schools being featured in the Team Stream presentations, following their quest leading up to and during the title game.” <a href=\"http://yhoo.it/1qb6FsM\">http://yhoo.it/1qb6FsM</a><i>...<b> Last year’s NCAA highlight piece</b></i><a href=\"http://bit.ly/21ZpB9r\">http://bit.ly/21ZpB9r</a> ... <a href=\"http://bit.ly/1pUoSKC\">http://bit.ly/1pUoSKC</a></p><b>Charles Barkley’s 30-second rendition </b><b>BIRTHWEEK (was yesterday):</b><p> Noah Schwartz … Susan Pisano</p><b>BIRTHDAYS:</b><p> Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ... </p><b>... Joe Hack</b><p>, Sen. Deb. Fischer’s chief of staff and one of the youngest (if not the youngest) chiefs on the Senate side, is 29 (h/t colleague Brianna Puccini) ... Michael David Morgan, former deputy campaign manager for WA State Sen. Michael Baumgartner and current operations analyst for Optimus (h/t Cara Mathis) ... Sean Long, son of two former staffers, the pride of Clarendon, VA and an up and coming star at the DOJ, is 23 (h/t colleague John Sheehy) ... John McCauley, deputy comms. director at the Truman National Security Project ... Jim O’Grady, WNYC reporter and professional storyteller ... Nikhil Joshi, OFA and Obama WH alum now senior manager for biz ops and strategy at Lending Club in San Fran ... Emily Hartmann of Global Strategy Group ...</p><b>... Danny Kanner, </b><p>now of NBA comms and alum of DGA and OFA ... Dan Reilly, NEA’s senior campaigns and elections specialist and a Teamsters alum … Margo McNabb … Christy Agner … Greg Boatright … Rep. Chellie Pingree (D-Me.) (h/ts Teresa Vilmain) … Abe Dyk … Kimberly Woodard … Sarah Fenn ... Kristina Clara Konrad-Williams ... Clare Osdene Schapiro ... Carole Chouinard ... Alex Rosenwald ... singer Emmylou Harris is 69 ... social critic and author Camille Paglia is 69 ... Jesse Carmichael (Maroon 5) is 37 ... Lee Dewyze (“American Idol”) is 30 ... Aaron Kelly (“American Idol”) is 23 (h/ts AP)</p><b>MATT MACKOWIAK,</b><p> who for years has faithfully provided Playbookers with Sunday TV listings each weekend, is a candidate for the Man of the Year Campaign benefitting the Leukemia & Lymphoma Society (LLS), whose mission is to fund cutting-edge research and eradicate blood cancers. <a href=\"http://bit.ly/1SuSlSs\">http://bit.ly/1SuSlSs</a></p><b>Please thank Matt by checking out his fundraising page. </b><b>And here are THE SHOWS,</b><p> which @MattMackowiak filed from Austin:</p><p>--<b>NBC’s “Meet the Press”</b>: Hillary Clinton ... Reince Priebus; Ron Johnson; roundtable: WTMJ-TV’s Charles Benson, David Brooks, Helene Cooper and Amy Walter</p><p>--<b>ABC’s “This Week”</b>: John Kasich; Bernie Sanders; Reince Priebus; roundtable: Donna Brazile, Matthew Dowd, Hugh Hewitt and Juan Williams</p><p>--<b>CBS’s “Face the Nation”</b>: Donald Trump; Reince Priebus; roundtable: Peggy Noonan, Ed O’Keefe, Mark Leibovich and Ruth Marcus; new results from the CBS News 2016 Battleground Tracker Poll from Wisconsin, Pennsylvania and New York with CBS News’ Anthony Salvanto</p><p>--<b>“Fox News Sunday”</b>: Donald Trump; Reince Priebus; roundtable: George Will, Julie Pace, Stephen Dinan and Charles Lane; “Power Player of the Week” with John Ficklin, the 10<sup>th</sup> member of his family to work in the White House</p><p>--<b>CNN’s “State of the Union” </b>(9am ET / 12pm ET): Reince Priebus; Bernie Sanders; roundtable: Bakari Sellers, Amanda Carpenter, Nina Turner and Andre Bauer</p><p>-<b>-Univision’s “Al Punto”</b> (SUN 10am ET / 1pm PT) Jorge Casta<b>ñ</b>eda; Venezuelan opposition leader and wife of political prisoner (Leopoldo L<b>ó</b>pez) Lilian Tintori; Republican political analysts Helen Aguirre and Adryana Boyne; FARC leader Rodrigo Londo<b>ñ</b>o Echeverri “Timochenko”; Peruvian presidential candidate (Peruvians for Change) Pedro Pablo Kuczynski; Peruvian presidential candidate (Popular Action) Alfredo Barnechea (substitute anchor: Enrique Acevedo)</p><p>--<b>CNN’s “Inside Politics” with John King </b>(SUN 8am ET): Roundtable: Jackie Kucinich, Ed O’Keefe, Jeff Zeleny and Abby Phillip</p><p>--<b>Fox News’ “Sunday Morning Futures” </b>(10am ET / 9am CT): Peter King; Allianz chief economic adviser Mohamed El-Erian; actor Scott Baio; roundtable: WSJ’s Jon Hilsenrath, Ed Rollins and Scott Brown</p><p>--<b>CNN’s “Reliable Sources”</b>: (SUN 11am ET): Author Ariana Huffington (“The Sleep Revolution”); Connie Chung and Maury Povich; Matthew Dowd; former People Magazine managing editor Larry Hackett</p><p>--<b>Fox News’ “MediaBuzz” </b>(SUN 11am ET / 10am CT): Trump campaign spokesperson Katrina Pierson; Andrea Tantaros; Meghan McCain; Kennedy; Sandra Smith; Susan Ferrechio; Amy Holmes; Kirsten Powers; tech analyst Shana Glenzer</p><p>--<b>CNN’s “Fareed Zakaria GPS”</b>: (SUN 11am ET): National security roundtable: RAND’s Seth Jones, NYPD’s John Miller and former CIA and FBI counterterrorism analyst Phil Mudd; roundtable: E.J. Dionne, author and Republican Main Street Partnership’s Geoffrey Kabaservice (“The Downfall of Moderation and the Destruction of the Republican Party”), author and Princeton University’s Nell Irvin Painter (“Southern History Across the Color Line”) and David Remnick</p><b>--C-SPAN</b><p>: <b>“The Communicators”</b> (SAT 6:30pm ET): American Cable Association president & CEO Matthew Polka and American Cable Association board chair Robert Gessner ... <b>“Newsmakers”</b> (SUN 10am ET): Club for Growth president David McIntosh, questioned by Washington Examiner’s Gabby Morrongiello and Politico’s James Hohmann ... <b>“Q&A”</b> (SUN 8pm & 11pm ET): Discussion with high school students attending the U.S. Senate Youth Program</p><p>--<b>MSNBC’s “PoliticsNation with Rev. Al Sharpton”</b>: (SUN 8-9am ET): Ben Carson; Clinton campaign national political director Amanda Renteria; Reuters’ Erin McPike; Amy Holmes; author Michael D’Antonio (“Never Enough”); One Wisconsin Now executive director Scot Ross</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 9-10am ET): Capital Times’ Jessie Opoien; The Boston Globe’s James Pindell; Trump campaign senior advisor Barry Bennett; Politico’s Gabe Debenedetti (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 10am-12pm ET): MSNBC’s Irin Carmon; Clinton campaign national spokeswoman Karen Finney; Michael Steele; Public Religion Research Institute’s Robert Jones; Michael Eric Dyson; Nina Turner; John Dean; Wisconsin-based radio host Charlie Sykes; The Nation Magazine’s John Nichols; One Wisconsin Now’s Scot Ross (hosted by MSNBC’s Joy Reid live from New York)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 12-2pm ET): Wisconsin-based radio host Charlie Sykes; Elise Jordan; Howard Dean; Kasich supporter Jack Voight; USA Today’s Paul Singer; RCP’s Rebecca Berg; State Rep. Melissa Sargent (D-WI); former Romney and Rubio advisor Avik Roy; author and Milwaukee Journal-Sentinel’s Jason Stein (“Dropping the Bomb: Scott Walker, Unions and the Fight for a State”) (hosted by MSNBC’s Alex Witt live from New York)</p><p>--<b>MSNBC’s “The Place for Politics”</b>: (SUN 3-4pm ET): The University of Wisconsin’s Mordecai Lee; former Wisconsin Lt. Gov. Barbara Lawton; WaPo’s Robert Costa; Milwaukee Mayor Tom Barrett (hosted by MSNBC’s Chris Jansing live from Wisconsin)</p><p>--<b>PBS’s “To the Contrary” with Bonnie Erbé</b>: Roundtable: Eleanor Holmes Norton; former Judge and federal prosecutor Debra Carnahan; Washington Examiner columnist Ashe Schow; Republican strategist Rina Shah Bharara and Concerned Women for America’s Penny Nance</p><p>--<b>SiriusXM’s “No Labels Radio” </b>(SAT 10am ET & 6pm ET, SUN 1PM ET): Host Jon Huntsman, with co-hosts No Labels co-founder Stuart Holliday and Politico’s Daniel Lippman, will discuss the Puerto Rico debt crisis with Puerto Rico’s lead restructuring advisor Jim Millstein, the 2016 election cycle with pollster Frank Luntz and the Wisconsin primary with Ron Johnson. <a href=\"http://bit.ly/1qdGxOa\">http://bit.ly/1qdGxOa</a></p><b>Pic of the hosts this weekend</b><p>--<b>Sinclair’s “Full Measure” with Sharyl Attkisson</b> (SUN 9:30am ET on WJLA and airing on Sinclair stations nationwide): the dangerous underground world of smuggling, of both drugs and humans; correspondent Scott Thuman examines how Morocco, one of the most beautiful countries on Earth, is being associated with the most recent terrorist attacks. </p><b>** A message from the Embassy of the United Arab Emirates:</b><p> Together, the UAE and US are working to make the Middle East more secure and prevent the proliferation of nuclear weapons.</p><p>The UAE is the only country in the Arabian Gulf to have a civilian nuclear energy accord (also known as a 123 Agreement) with the US. Called the “gold standard” by nonproliferation experts and government leaders, the UAE has boldly pledged not to enrich uranium or extract plutonium.</p><p>Next year, the UAE’s first nuclear energy station will begin generating safe, clean electricity. A leader in developing new technologies for renewable and sustainable energy, the UAE is also home to the International Renewable Energy Agency and the Masdar Institute.</p><p>Learn about how the UAE and US are united for a better future: <a href=\"http://bit.ly/1WFzIyE\">bit.ly/1WFzIyE</a> **</p><b>SUBSCRIBE</b><p> to the Playbook family: <b>POLITICO Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331\">http://politi.co/1M75UbX</a> ... <b>New York Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba\">http://politi.co/1ON8bqW</a> ... <b>Florida Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0\">http://politi.co/1JDm23W</a> ... <b>New Jersey Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337\">http://politi.co/1HLKltF</a> ...<b> Massachusetts Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453\">http://politi.co/1Nhtq5v</a> ... <b>Illinois Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc\">http://politi.co/1N7u5sb</a> ... <b>California Playbook</b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543\">http://politi.co/1N8zdJU</a> ... <b>Brussels Playbook </b><a href=\"http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef\">http://politi.co/1FZeLcw</a></p>",
"extract_length" : 34299,
"extract_checksum" : "wwIxgMbCy1ndAg4z22wIU0FCl3k",
"summary_text" : "Dr. Jud Feldman (hat tip: MBF) ... Brent Colburn ... Meridith Webster, director of comms. and public affairs at Bloomberg and an Obama West Wing alum -- fun fact: Favorite color is orange! (h/ts Ben Chang) ... Politico’s Dana Rubinstein and Josefa Velasquez … Lynda Tran, 270 Strategies founding partner and CBS News political contributor (h/t Heather Purcell and Eleis Brennan) ... Brian Austin, consultant at Kaiser Associates (h/t wife Emily Stephenson) ... Emily Steel, a TV and media reporter at the NYT and a WSJ and FT alum ... Carl Kasell, formerly of NPR (whose voice is on HIS answering machine?) … Robby Zirkelbach, SVP of comms at PhRMA and AHIP alum, mediocre golfer, avid Hawkeyes fan and great friend (h/t Joe Brettell) ... Dan Sallick, partner and co-founder of Subject Matter (h/t Peter Cherukuri) ...\n\n",
"title" : "BERNIE GETS UNDER HILLARY’S SKIN -- MANAFOT RISING: Trump’s new guru builds empire, including veterans of GOP’s last contested convention – B’DAYS: Brent Colburn, Meridith Webster",
"published" : "2016-04-02T17:41:31Z",
"publisher" : "POLITICO",
"description" : "Inside the campaigns and Corey Lewandowski's shrinking role.",
"links" : [ "http://amzn.to/1RS7lfM", "http://amzn.to/1g8Rlak", "http://amzn.to/1pUgx9P", "http://api.addthis.com/oexchange/0.8/forward/facebook/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com", "http://api.addthis.com/oexchange/0.8/forward/googleplus/offer?pco=tbxnj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&title=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster", "http://api.addthis.com/oexchange/0.8/forward/twitter/offer?pco=tbx32nj-1.0&url=http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545&pubid=politico.com&text=BERNIE+GETS+UNDER+HILLARY%E2%80%99S+SKIN+--+MANAFOT+RISING%3A+Trump%E2%80%99s+new+guru+builds+empire%2C+including+veterans+of+GOP%E2%80%99s+last+contested+convention+%E2%80%93+B%E2%80%99DAYS%3A+Brent+Colburn%2C+Meridith+Webster", "http://apne.ws/1RS7xLX", "http://apne.ws/1pUb1Uo", "http://bit.ly/1M1RDiU", "http://bit.ly/1M7kPVN", "http://bit.ly/1MHUeyx", "http://bit.ly/1MHUqhg", "http://bit.ly/1RRUhds", "http://bit.ly/1RRUw8l", "http://bit.ly/1RuAL4x", "http://bit.ly/1SGsWYC", "http://bit.ly/1SuSlSs", "http://bit.ly/1UZby4C", "http://bit.ly/1UyiRAk", "http://bit.ly/1WFzIyE", "http://bit.ly/1Y4E0jF", "http://bit.ly/1Y4FDh8", "http://bit.ly/1pUoSKC", "http://bit.ly/1qdGxOa", "http://bit.ly/1qkEUOn", "http://bit.ly/21XVFdO", "http://bit.ly/21ZpB9r", "http://bit.ly/25D0qOL", "http://bloom.bg/1RRUqxm", "http://cbsn.ws/1Y6qImE", "http://cnn.it/1SGvhCL", "http://cnn.it/1SsWF4w", "http://for.tn/1VjJ7gB", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd4109bd5463ef8db58d28d5151d70170c0d641b0aeacfb453", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd43ea324572eabb35237a2c8ff38089b777628beacc2d4337", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfd6d4bff376738b3cc9b795cde953f294cf858b75e52b1dffc", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdc6d26c38b244d39857441567733e7f3775cbd8ffde507bef", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd58947052fda88f4ea11d09a07f7f5eb6c84873711479543", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdd95a477c264b940d3d68e1bf06d895022c7331c2d59002ba", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6a1974229c1d9ebdd76fec80e3e0b9c4f389a6aebb15331", "http://go.politicoemail.com/?qs=b48cc63abe2b3bfdf6adbc9a5e2ca9fb024caa7dd707c64d9f8d05282a7904f0", "http://nyti.ms/1RS6BqQ", "http://nyti.ms/1ZTJn6E", "http://politi.co/1LYiY5C", "http://politi.co/1M75UbX", "http://politi.co/1RRWYvm", "http://politi.co/1RRWsxo", "http://politi.co/1UJ0n0k", "http://politi.co/22WoUUd", "http://politi.co/22YORlU", "http://politi.co/25Bwouu", "http://www.autismspeaks.org", "http://www.politico.com/magazine/story/2016/03/2016-election-defense-military-industry-contractors-donations-money-contributions-presidential-hillary-clinton-bernie-sanders-republican-ted-cruz-213783", "http://www.politico.com/magazine/story/2016/03/donald-trump-2016-terrorist-attack-foreign-policy-213784", "http://www.politico.com/magazine/story/2016/03/doug-sosnik-memo-2016-is-over-213753", "http://www.politico.com/magazine/story/2016/04/donald-trump-ted-cruz-2016-911-muslims-tom-ridge-213785", "http://www.politico.com/magazine/story/2016/04/the-next-donald-trumps-213786", "http://www.politico.com/playbook", "http://www.politico.com/playbook/2016/03/great-mentioner-trumps-cabinet-jill-abramsons-new-column-david-gregorys-tv-gig-bday-peter-cherukuri-paul-farhi-robert-gibbs-steve-peoples-roger-simon-peter-velz-213455", "http://www.politico.com/playbook/2016/03/how-will-bill-take-trump-attacks-on-hillary-playbook-breakfast-on-the-court-tomorrow-white-house-counsel-neil-eggleston-senior-adviser-brian-deese-remembering-jennifer-frey-213503", "http://www.politico.com/playbook/2016/03/trump-to-people-im-the-highest-level-of-smart-4-hours-of-sleep-on-working-out-dont-have-to-when-youre-making-america-great-again-you-get-a-lot-of-exercise-213479", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#", "http://www.politico.com/playbook/2016/04/bernie-gets-under-hillarys-skin-manafot-rising-trumps-new-guru-builds-empire-including-veterans-of-gops-last-contested-convention-bdays-brent-colburn-meridith-webster-213545#superComments", "http://www.politico.com/playbook/2016/04/trump-would-be-most-unpopular-major-party-nominee-in-32-years-playbook-breakfast-with-white-house-counsel-neil-eggleston-and-senior-adviser-brian-deese-livestreams-7-57-am-213524", "http://www.politico.com/story/2016/03/ted-cruz-christian-evangelical-vote-221349", "http://www.politico.com/story/2016/04/donald-trump-corey-lewandowski-shrinking-role-campaign-221487", "http://www.politico.com/story/2016/04/donald-trump-delegates-north-dakota-gop-221480", "http://www.politico.com/story/2016/04/donald-trump-tennessee-gop-delegates-221489", "http://www.politico.com/story/2016/04/hillary-clinton-bernie-sanders-attacks-221484", "http://www.politico.com/story/2016/04/hillary-clinton-fbi-strategy-emails-221435", "http://www.politico.com/story/2016/04/obama-donald-trump-presser-221486", "http://www.politico.com/tipsheets/playbook/archive", "http://www.politico.com/tipsheets/the-2016-blast/2016/04/cruzs-evangelical-problem-democrats-lying-spat-clinton-aides-all-four-one-kasich-doesnt-like-lyin-213544", "http://yhoo.it/1qb6FsM" ],
"author_name" : "Kristen East",
"author_link" : "http://www.politico.com/staff/east-kristen",
"author_gender" : "FEMALE",
"image_src" : "http://static.politico.com/97/de/e15842c94e928ce8b561e20745a0/whitelogoondots.jpeg",
"card" : "SUMMARY_LARGE_IMAGE",
"type" : "POST",
"sentiment" : "POSITIVE",
"lang" : "en",
"categories" : {
"business" : 3.6933053736507154E-5,
"entertainment" : 0.003606990311947353,
"health" : 3.339112750008408E-8,
"politics" : 0.9963494407896508,
"science" : 2.1030360920429964E-11,
"sports" : 6.512794233646671E-6,
"technology" : 8.963827337022493E-8
},
"duplicates" : {
"1459618662000015199" : 1.0
},
"duplicates_count" : 1,
"metadata_score" : 124
}
Once authentication headers are provided, you can run normal Elasticsearch queries via our REST API.
You can read more about authentication but all the following examples include proper authentication (using your credentials) if you have an account and are logged in.
You can simply just copy/paste examples into your tools to get started.
We provide a near-raw Elasticsearch endpoint. All Elasticsearch queries can
be specified (except for calls which can be destructive or index new documents).
The best way to get more information about queries is to
read the Elasticsearch documentation.
Spinn3r search is available via REST and JSON. The JSON results use the same document structure/schema that our firehose uses which provides easy integration between both APIs.
User Interface vs API
Search results are made available via web based user interface as well as a REST-based JSON API.
Spinn3r is primarily an API-oriented service so once in production you will simply be calling our APIs and fetching data.
However, we do ship a user interface which you can use to interact with the data and get a better understanding of the API.
Here’s a screenshot of a search within mainstream news:
Filters
Example query which uses a filter to constrain the results:
{
"size": 10,
"query": {
"query_string" : {
"query" : "main:Firefox"
}
},
"filter": {
"query" : {
"query_string" : {
"query" : "(lang:en OR lang:es) AND (source_publisher_type:WEBLOG OR source_publisher_type:MAINSTREAM_NEWS)"
}
}
}
}
You can also add filters to these results to constrain the records we search over.
Note that we have two nested “query string” queries here.
Technically we can merge them into the first query but a key feature of filters is that they are cached and VERY fast once calculated.
This means you can keep re-using this filter and your subsequent queries will be faster because we can use the existing filter and just search over documents in that filter.
Filters are updated on the fly when new documents are updated so you don’t have to worry about recalculating them.
Query String Query
{
"query": {
"query_string" : {
"query" : "source_publisher_type:MAINSTREAM_NEWS AND main:Ebola"
}
}
}
Elasticsearch has a “query string query” syntax that’s very handy for working with queries and doesn’t require expressing the query in full JSON and can make things much easier.
You can read more about it here
You can construct a query string query in JSON:
This includes two fields using a boolean AND
expression.
Most fields in our schema can be used to express queries using a combination
of AND
and OR
clauses.
Some additional of queries examples would include:
query | explanation |
---|---|
source_publisher_type:WEBLOG |
only return content from weblogs |
source_publisher_type:WEBLOG AND lang:en |
only return weblogs in english (using our language classifier) |
source_publisher_type:WEBLOG OR source_publisher_type:MAINSTREAM_NEWS |
weblogs or mainstream news |
tags:linux |
tagged 'linux’ (either hashtags or user specified tags). |
main:linux |
contains word linux in the 'main’ field. |
domain:cnn.com |
content from the domain cnn.com |
Time filters
Absolute time filters:
{
"query": {
"query_string": {
"query": "source_publisher_type:MAINSTREAM_NEWS"
}
},
"filter": {
"range": {
"date_found": {
"gte": "2016-10-03T10:25:15Z",
"lte": "2016-10-03T10:55:15Z",
"format": "date_time_no_millis"
}
}
}
}
Relative time filters:
{
"query": {
"query_string": {
"query": "source_publisher_type:MAINSTREAM_NEWS"
}
},
"filter": {
"range": {
"date_found": {
"gte": "now-1h"
}
}
}
}
It’s also possible to limit the query by time.
It’s best to use ISO8601 time strings in filters which avoid problems with timezones and are easier to read.
Relative time
You can also sort by relative time.
This will give you all posts from mainstream news between the above dates.
Sorting by custom fields
{
"query": {
"query_string": {
"query": "source_publisher_type:MAINSTREAM_NEWS"
}
},
"sort" : [ {
"published" : {
"order" : "desc"
}
} ]
}
You can also sort by custom fields. Here’s an example of searching by
the published
field descending.
Sorting by custom fields requires more CPU than normal so we would suggest only doing so when necessary.
Aggregations
Spinn3r provides support for Elasticsearch aggregations on top of our core data platform:
For example, one could search for “fishing” and get back the top 100 posts over the last 60 days. However, it would be nice to get a report of the volume of these posts, per hour, over the same time period.
This can be done with aggregations wherein you specify a modified query which returns buckets, one per hour, with the number of posts matching your query within that hour.
If you use too many aggregations excessive memory can be used. We only support a maximum of 2GB per node per query. However, we have more than 100 servers so in practice this will be more than 200GB of RAM which is more than enough for most purposes.
Location based queries
Specific geo fields:
{
...
"geo_location" : "Lorain, OH",
"geo_location_id" : "91d57ea9ae3b0bbd",
"geo_featurename" : "PPL",
"geo_point" : "41.45282,-82.18237",
"geo_name_id" : "5161262",
"geo_name" : "Lorain, Lorain County, Ohio, United States",
"geo_country" : "US",
"geo_state" : "Ohio",
"geo_city" : "Lorain"
...
}
Search for all documents in the United States:
{
"size": 1,
"query": {
"term" : { "geo_country" : "US" }
}
}
Our geocoding system extracts the location based information to denormalized fields
Note that all country codes are ISO “alpha-2” country codes.
We have a full list of country codes available here
Searching for specific countries
You can search for specific countries and states by combining fields.
For example: geo_country:US AND geo_state:"New York"
Would find all posts in the United States and in New York.
We have a full list of states available here
Geo_point queries
{
"size": 1,
"query": {
"geo_bounding_box": {
"geo_point": {
"top_left": {
"lat": 42,
"lon": -72
},
"bottom_right": {
"lat": 40,
"lon": -74
}
}
}
}
}
The field geo_point accept latitude-longitude pairs integrating geometrical operations, which can be used for queries like this:
For more information on geo_point based queries available here
Categories
We support categories in our search which are backed by our classifier API which puts content into categories like politics, entertainment, health, etc.
You can search for these by running queries for:
query | description |
---|---|
categories.business:>0.85 | Content with the category business and a greater than 85% probability |
The values are between 0.0 and 1.0 and all of them sum to 1.0.
One strategy is to pick a main category (say 'politics’) and then search for that.
The secondary categories (when sorted by rank) can also be used to determine what flavor of political content you’re indexing.
For example if the politics score is 0.9 and the technology score is 0.1 then you know this is a political story about technology.
Hot / warm / cold architecture
We use a hot/warm/cold architecture for storing content long term to maximize both performance and content density.
alias | description |
---|---|
content_* | Access content in the last 30 days on ultra-fast SSD with 25% of is cached in memory |
warm_content_* | Access an additional 6 months of content on a larger cluster of HDD machines |
cold_content_* | Access an additional 3 months of content on higher density HDD machines |
Hot content
Alias: content_*
Initially (the first 30 days) of content sits on our hot
architecture.
This is based on SSD (solid state drives) and about 25% of the data is cached in memory. Searches within the hot window will execute very fast. Usually within 1 second per query.
The index alias for this is just content_*
so all searches using that alias
will be very fast.
This should probably be your production alias as queries in this range will be very fast.
Warm content
Alias: warm_content_*
Content under warm content aliases is stored on a separate hardware profile using hard disks. These are designed for bulk storage which means that query times for this content will be higher.
The cluster storing warm has a large number of machines so is much faster than the cold content.
Cold content
Alias: cold_content_*
This content is also stored on HDD but we use higher density drives and more drives per server. This means the content is a bit slower than warm content but allows us to index a LOT more data.
Using multiple aliases to index more data.
You can search over ALL indexes if you want to search the entire content range at once. This is accomplished by adding a comma between the indexes.
Alias: cold_content_*,warm_content_*,content_*
Document clustering
{
"size" : 10,
"query" : {
"filtered" : {
"query" : {
"mlt" : {
"fields" : [ "main" ],
"like_text" : "US military conducted airstrike against Ansar Al-Sharia killing 8 commanders ... ",
"min_term_freq" : 1,
"max_query_terms" : 250
}
},
"filter" : {
"and" : {
"filters" : [ {
"range" : {
"date_found" : {
"gte": "2015-07-21T10:25:15Z",
"lte": "2016-07-21T10:55:15Z",
"format": "date_time_no_millis",
"include_lower" : true,
"include_upper" : true
}
}
}, {
"terms" : {
"source_publisher_type" : [ "MICROBLOG" ]
}
} ]
}
}
}
},
"fields" : [ "main", "date_found" ]
}
Elasticsearch supports finding documents similar to other documents.
This can be used for basic document clustering. If you need more advanced document clustering you should consider using something like k-means or support vector machines. However, this is a very quick and easy way to get basic clustering and related documents.
This is an advanced “more like this” (MLT) query query showing how to identify similar documents, within a time range, and for a specific publisher type.
Fetching specific URLs
{
"query" : {
"match" : {
"permalink" : "http://www.voiceofsandiego.org/events/member-coffee-4/"
}
}
}
You can also search for the content for a specific URL. For example:
Cacheable filters for faster searches
Uncacheable filter (bad)
{
"range": {
"date_found": {
"gte": "2015-12-13T20:43:01Z",
"lte": "now"
}
}
}
By default Elasticsearch will try to cache filters.. If you’re using a filter like like the one on the right.
It won’t be cached because the timestamp will continue to change if you calculate an exact timestamp per query.
Cacheable filter (good)
{
"range": {
"date_found": {
"gte": "2015-12-13T20:00:00Z",
"lte": "now"
}
}
The revised filter is better and Elasticsearch will cache (and update as it changes) the filter over time and your searches will be much faster since the filter doesn’t have to be rebuilt each time.
Avoid Expect headers with curl
Bypass curl expect header:
-H Expect:
If you’re using the command line app curl
to work with search you may
have problems with large queries.
Curl will add an Expect header which doesn’t work with our API auth layer.
You can bypass it by adding
To the command line. Otherwise your large queries won’t work when debugging via curl and they will just timeout.
There’s no downside here to add this other and doesn’t modify the query or HTTP data sent.
Search on exact fields.
It’s best to search on exact fields and avoid using the main
field if possible.
For example, you could search for hashtags by searching for:
Search for hashtags:
main:#linux
or you could search over the tags field:
Search for tags field:
tags:linux
Additionally this will work for other tag specifications other than hashtags.
You can do the same thing with author mentions. For example a query for:
Mention search:
main:@barackobama
can be rewritten to:
Mention field search:
mentions:barackobama
Searching from Python
#!/usr/bin/env python
import requests
import json
####
VENDOR="{{vendor}}"
VENDOR_AUTH="{{vendor_auth}}"
####
# Define the query you want to run.
QUERY="""
{
"size": 1500,
"query": {
"query_string" : {
"query" : "main:Obama"
}
}
}
"""
NR_PAGES=100
###
# generic function to just write data to disk
def handle_data(page, response):
file_name="%04d.json" % page
print "Writing JSON data to: %s" % file_name
f=open( file_name, "w" );
f.write( response.content )
f.close()
###
# Perform the first request. The URL needs to be slightly different because
# we have to specify the index name here.
url='http://%s.elasticsearch.spinn3r.com/content*/_search?scroll=5m&pretty=true' % VENDOR
print "Fetching from %s" % url
print "Running query: "
print QUERY
## we have to add our vendor code information to the request now.
headers = { 'X-vendor': VENDOR,
'X-vendor-auth': VENDOR_AUTH }
response = requests.post( url, headers=headers, data=QUERY )
####
# now that we have the first result we have to parse in the scroll ID. The first
# page is LITERALLY just the scroll ID.
data=json.loads(response.content)
scroll_id = data["_scroll_id"]
handle_data(0,response)
print "Query took: %sms" % data["took"]
print "Total hits: %s" % data["hits"]["total"]
for page in xrange( 1, NR_PAGES):
url='http://%s.elasticsearch.spinn3r.com/_search/scroll?scroll=5m&pretty=true' % VENDOR
response = requests.post( url, headers=headers, data=scroll_id )
handle_data(page,response)
scroll_id = response.json()["_scroll_id"]
Both curl
and the search UI are great for getting up and running quickly.
For more advanced usage you might want to use a language like Python.
Here’s a quick Python script for running queries, paging through the results, and writing them to disk.
Your vendor code has been updated below so you can just use the script as-is.
This example uses the scroll API to fetch the documents page by page and writes them to individual files with 1500 records each.
Searching from Java
JestClientFactory factory = new JestClientFactory();
factory.setHttpClientConfig(new HttpClientConfig
.Builder("http://{{vendor}}.elasticsearch.spinn3r.com")
.multiThreaded(true)
.build());
JestClient client = factory.getObject();
String query = "{\n" +
" \"size\": 1,\n" +
" \"query\": {\n" +
" \"query_string\" : {\n" +
" \"query\" : \"main:Firefox\"\n" +
" }\n" +
" } \n" +
"}";
Search search = new Search.Builder(query)
// multiple index or types can be added.
.addIndex("content_*")
.addType("content")
.setHeader( "X-vendor", "{{vendor}}" )
.setHeader( "X-vendor-auth", "{{vendor_auth}}" )
.build();
SearchResult result = client.execute(search);
System.out.printf( "%s\n", result.getTotal() );
When working within Java we recommend using Jest for performing REST / JSON calls against our search API.
Here’s an example using the Jest API for making calls into Spinn3r. Make sure to include the authentication headers below so that your requests are accepted.
Classifier
The Classifier API allows developers to analyze text and news articles for semantic meaning and to classify text into hashtags, stock ticker symbols, or any other classifications we’ve created.
The classifier works by taking a large set of training examples with labels generated per classification, then using a linear classifier to build a model which enables us to mathematically label future input text.
We provide pre-trained classifiers for public use based on public data sources which are classified by social media users.
Selecting Classifiers
Classifiers are identified by selectors driven by name/value tag pairs.
Right now we have two standard tags we use to identify classifiers:
name | description |
---|---|
textType | Type of text we’re classifying. NEWS is the only supported type at the moment. This allows us to specify which form of text a classifier was trained with. |
labelType | Specifies the type of labels to output. HASHTAG is the only supported type at the moment. This allows us to specify the type of the labels. For example, labels could be wikipedia pages, hashtags, stock ticker symbols, etc. |
Currently Supported Classifiers
lang | textType | labelType | description |
---|---|---|---|
en | NEWS | HASHTAG | Classifies new articles into the hashtag labels |
Analyze Text
The analyze text API allows a developer to analyze raw text to compute labels based on the underlying text structure.
For example, you could give us a news article about politics and we would give you
back labels like BarackObama
, MittRomney
, etc.
Endpoint
Use URL endpoint: http://{{vendor}}.classifier.spinn3r.com/v1/classifier/analyze-text
Requests are JSON and must have the Content-Type: application/json
HTTP header.
Requests
Example request:
{
"classifiers": [ {
"selector": {
"textType": "NEWS",
"labelType": "HASHTAG"
}
} ],
"lang": "en",
"text": "(CNN)President Barack Obama's approval rating stands at 55% in a new CNN/ORC poll, the highest mark of his second term ... "
}
Requests are simply JSON messages specifying the language, text, and classifier we should use.
name | optional | description |
---|---|---|
classifiers | no | A JSON array describing selectors for classifiers we wish to use to train the given text |
lang | yes | language of the content. We recommend including this especially for short content. If you don’t specify it we will just use our language classifier to detect the language |
text | no | The text you would like to classify |
Classifier
A JSON object explaining which classifier you would like to use to classify your text.
Right now we support a selector
which contains two tags textType
and labelType
specifying which type of classifier to use to classify the
text.
name | optional | description |
---|---|---|
selector | no | A JSON object specifying which classifier to use for classification |
Multiple classifiers can be specified if necessary.
Selector
Contains a map of tag to value pairs used to select the classifier. These
must specify at least textType
and labelType
to select the right
classifier.
Responses
{
"lang" : "en",
"classifications" : [
{
"classifier" : {
"selector" : {
"textType" : "NEWS",
"labelType" : "HASHTAG"
}
},
"labels" : [
{
"label" : "DNCleak",
"probability" : 0.18919783089447492
},
{
"label" : "politics",
"probability" : 0.12080862285793965
},
...
]
}
]
}
We return the lang
and the list of classifications of the input text as
well as the list of labels per classifier.
name | optional | description |
---|---|---|
lang | no | The language code that was used to classify the text |
classifications | no | The list of classifications for with metadata about each label |
Labels
An individual label includes the following columns:
name | optional | description |
---|---|---|
label | no | The label assigned to the category |
probability | no | The probability that the document belongs in the given label. |
Note that all the labels MAY NOT sum to 1.0. We only include the primary labels. You can assume that the remaining categories sum to 1.0.
For example, if the classifier has 4 total labels, and we return two labels of
foo=0.6
and bar=0.1
this will sum to 0.7
and we can assume
the remaining labels sum to 0.3
.
Analyze Link
The analyze link API functions exactly like the the analyze-text
API
but we allow the developer to specify a link / URL instead of text.
We then fetch the URL, perform content extract (chrome removal) on the URL so that only the article text is included, and then perform classification.
Endpoint
Use URL endpoint: http://{{vendor}}.classifier.spinn3r.com/v1/classifier/analyze-link
Requests are JSON and must have the Content-Type: application/json
HTTP header.
Request
Example request:
{
"classifiers": [ {
"selector": {
"textType": "NEWS",
"labelType": "HASHTAG"
}
} ],
"lang": "en",
"link": "http://www.cnn.com/2016/10/06/politics/obama-approval-rating-new-high/",
"textStrategy": "EXTRACT"
}
The request is almost identical to analyze-text
however we use the following
fields.
name | optional | description |
---|---|---|
classifier | no | JSON object describing the classifier we should use |
lang | yes | language of the content. We recommend including this especially for short content. If you don’t specify it we will just use our language classifier to detect the language |
link | no | A link to a URL you would like to classify |
textStrategy | yes | The strategy used to compute the text to perform the classification. We support FULL to have the full text of the page used as well as EXTRACT which removes sidebar content and other page noise |
Hive
SELECT tmp2.tag, COUNT(tmp2.tag) AS ranking FROM (
SELECT tmp1.author_handle, EXPLODE(tmp1.tags) AS tag FROM (
SELECT tmp.author_handle, COLLECT_SET(tmp.tag) AS tags FROM (
SELECT author_handle, EXPLODE(tags) AS tag FROM content WHERE
lang='en' AND
source_publisher_type='MICROBLOG' AND
source_followers >= 10000 AND
date_found > DATE_SUB(CURRENT_TIMESTAMP(), 30)
) AS tmp GROUP BY tmp.author_handle
) AS tmp1
) AS tmp2 GROUP BY tmp2.tag ORDER BY ranking DESC LIMIT 200000
Spinn3r has support for Apache Hive/Spark queries on top of our massive datastore.
We have over 50TB of content in our search index spread over 8 months.
This is a massive amount of data which can be used to build extremely powerful applications.
Unfortunately, access to this kind of data is not cheap. However, we can now host large batch jobs and provide static data exports - making it much more affordable for our customers/
All of this is powered via Apache Spark and Hive meaning that you can represent powerful exports in simple SQL format in an Open Source platform you’re already familiar with.
This is a recent SQL export we ran for a customer which exported top users and resulted in a 5TB static export.
At the moment we haven’t exposed a direct API for this functionality.
Right now Spark/Hive require a lot of per-job specific settings to run and we want something that is easier for our customers.
What we’re currently doing is allowing our customers to execute the SQL and then we provide a static tar.gz download for them.
Please contact support if you would like to execute a Hive query.
Parser
Basic request:
curl -XPOST 'http://{{vendor}}.rest.spinn3r.com/v1/parser/parse' --header "X-vendor: {{vendor}}" --header "X-vendor-auth: {{vendor_auth}}" -d '{
"link": "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
"publisherType": "WEBLOG"
}
'
Example output:
{
"parser" : "com.spinn3r.artemis.robot.metadata.instrumented.InstrumentedParser",
"request" : {
"link" : "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
"publisherType" : "WEBLOG",
"parseStrategy" : "PERMALINK"
},
"content" : [
{
"shortlink" : "http://wp.me/p1FaB8-56hH",
"canonical" : "http://social.techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
"main" : "<article> \n <div> \n <div> \n <div> \n <div> \n <div> \n <div> \n <div> \n <img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/2042117809_4bf153cf11_b.jpg?w=738\" /> \n <p>Today, Google’s CEO Sundar Pichai <a href=\"http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html\">shared details on a new plan</a> to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that aren’t connected.</p> \n <p>The key? India’s train system. And a plan to bring Wi-Fi to its 10 million rail passengers a day. And it’s free (to start). Pichai shared Google’s plans, while sharing his own story about his days using Chennai Central station to get to <a href=\"http://www.iitkgp.ac.in/\">school</a>.</p> \n <blockquote>\n <p>We’d like to help get these next billion Indians online—so they can access the entire web, and all of its information and opportunity. And not just with any old connection—with fast broadband so they can experience the best of the web. That’s why, today, on the occasion of Indian Prime Minister Narendra Modi’s visit to our U.S. headquarters, and in line with his Digital India initiative, we announced a new project to provide high-speed public Wi-Fi in 400 train stations across India.</p>\n </blockquote> \n <p>All of the big tech companies have been getting a visit from Indian Prime Minister Narendra Modi, with Facebook being <a href=\"https://techcrunch.com/2015/09/27/modiberg/\">one of them</a>. Each company seems to have its own ideas on how to expand Internet availability and Google’s is definitely unique.</p> \n <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/modi-sundar-alt-twitter.jpg?w=640&h=474\" alt=\"modi-sundar alt twitter\" width=\"640\" height=\"474\" /></p> \n <p>Here’s a map of the first 100 train stations that will get Wi-Fi by the end of 2016:</p>\n <div>\n <div></div>\n </div> \n <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/indiawifi_zoomed-our.jpg?w=640&h=399\" alt=\"IndiaWifi_zoomed our\" width=\"640\" height=\"399\" /></p> \n <p>Google will be working with <a href=\"http://www.indianrailways.gov.in/\">Indian Railways</a> and RailTel on the initiative.</p> \n <p><img src=\"https://tctechcrunch2011.files.wordpress.com/2015/09/sundar_pichai_cropped.jpg?w=221&h=300\" alt=\"Sundar_Pichai_(cropped)\" width=\"221\" height=\"300\" />Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:</p> \n <blockquote>\n <p>Even with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.</p>\n </blockquote> \n <p>This is the first big initiative for Sundar Pichai as <a href=\"https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/\">the CEO of Google</a>, which will also be holding a <a href=\"https://techcrunch.com/2015/09/18/you-only-have-one-shot/\">major hardware event this week</a>. He noted “It’s my hope that this Wi-Fi project will make all these things a little easier.” This initiative, along with others like the <a href=\"https://techcrunch.com/2014/06/25/android-one/\">Android One</a> project should help the next generations of the residents of India get — and stay — online.</p> \n <small>Featured Image: <a href=\"https://www.flickr.com/photos/superk550i/2042117809/in/photolist-47sotK-7spnaC-podmnG-gzu6y7-gzu4cR-kQHiMB-8ZUVna-6dMB1r-47snyZ-47wtxQ-47wtCG-47sjFM-ekdEGQ-7adwcX-osi4Np-a3y2iZ-7skxR6-67Lz1x-a3AUXy-9gsZjb-bHdyuP-nWnzdd-gztXDY-oPYzci-gztFed-r2VZYK-gztjRj-s1BxN9-prqnaT-ag9oJ4-8NcLVe-8MRTHi-7Biiee-ag9h1K-6dMBPn-prkeHA-dUFjzm-fydF5p-b3BnjZ-bHdy6K-6dRJZJ-dsjdpp-buiLFy-oLMegv-apTuag-nRccft-nueVin-ecovbQ-6oonig-bVMAE7\">superk550i</a>/<a href=\"https://www.flickr.com/\">Flickr</a> UNDER A <a href=\"http://creativecommons.org/licenses/by/2.0/\">CC BY 2.0</a> LICENSE</small> \n </div> \n <div> \n <ul> \n <li> <h5>0</h5><br /> <small>SHARES</small> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n <li> <a href=\"http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#\" rel=\"external\"></a> </li> \n </ul> \n </div> \n <div></div> \n <div></div> \n <div> \n <small> <a href=\"https://techcrunch.com/advertise/\" title=\"Advertise on TechCrunch\"> Advertisement </a> </small> \n <div></div> \n <div></div> \n </div> \n </div> \n </div> \n <div> \n <div> \n <small> <a href=\"https://techcrunch.com/advertise/\" title=\"Advertise on TechCrunch\"> Advertisement </a> </small> \n <div></div> \n </div> \n <div> \n <h3>CrunchBase</h3> \n <div> \n <ul> \n <li> <h4> <a href=\"https://crunchbase.com/organization/google/\"> Google </a> </h4> \n <div> \n <ul> \n <li> <strong>Founded</strong> 1998 </li> \n <li> <strong>Overview</strong> Google is a multinational corporation that is specialized in internet-related services and products. The company’s product portfolio includes Google Search, which provides users with access to information online; Knowledge Graph that allows users to search for things, people, or places as well as builds systems recognizing speech and understanding natural language; Google Now, which provides information … </li> \n <li> <strong>Location</strong> <a href=\"http://crunchbase.com/location/mountain-view/37bfe551197e02269a805d7bec8a50dd\">Mountain View, CA</a> </li> \n <li> <strong>Categories</strong> <a href=\"http://crunchbase.com/category/search/0d39287e5b1377a6970274294daded43\">Search</a>, <a href=\"http://crunchbase.com/category/email/687e34ef30e2af08a178436d6b53d633\">Email</a>, <a href=\"http://crunchbase.com/category/blogging-platforms/1ded70549e6d2ede57adc95e83be0c06\">Blogging Platforms</a>, <a href=\"http://crunchbase.com/category/information-technology/dbca89faf0835438b4add3fdeceb78e7\">Information Technology</a>, <a href=\"http://crunchbase.com/category/video-streaming/b1b3b2d785ed2cb1fc603e2b6a3b5ddd\">Video Streaming</a>, <a href=\"http://crunchbase.com/category/software/c08b5441a05b9777b7a6012728caddd9\">Software</a> </li> \n <li> <strong>Website</strong> <a href=\"http://www.google.com/\">http://www.google.com/</a> </li> \n <li> <a href=\"https://crunchbase.com/organization/google\">Full profile for Google</a> </li> \n </ul> \n </div> </li> \n <li> <h4> <a href=\"https://crunchbase.com/person/sundar-pichai/\"> Sundar Pichai </a> </h4> \n <div> \n <ul> \n <li> <strong>Bio</strong> Sundar Pichai joined Google in 2004 and became CEO in August of 2015. Prior to that, he led the product management and innovation efforts for a suite of Google's consumer products, including Google Toolbar, Chrome and Chrome OS. He was also responsible for the HTML5 and open web platform efforts at Google. Before joining Google, he held various engineering and product management positions at Applied … </li> \n <li> <a href=\"https://crunchbase.com/person/sundar-pichai\">Full profile for Sundar Pichai</a> </li> \n </ul> \n </div> </li> \n </ul> \n </div> \n </div> \n <div> \n <h2> Newsletter Subscriptions </h2> \n <div> \n <div> \n <div> \n <strong>The Daily Crunch</strong> Get the top tech stories of the day delivered to your inbox \n <strong>TC Weekly Roundup</strong> Get a weekly recap of the biggest tech stories \n <strong>CrunchBase Daily</strong> The latest startup funding announcements \n </div> Enter Address Subscribe \n </div> \n </div> \n </div> \n <div> \n <div></div> \n </div> \n </div> \n </div> \n </div> \n </div> \n <div> \n <div> \n <ul> \n <li> \n <div> \n <a href=\"https://techcrunch.com/tag/india/\"> india </a> \n </div> </li> \n <li> \n <div> \n <a href=\"https://techcrunch.com/tag/sundar-pichai/\"> Sundar Pichai </a> \n </div> </li> \n <li> \n <div> \n <a href=\"https://techcrunch.com/topic/company/google/\"> Google </a> \n </div> </li> \n <li> \n <div> \n <a href=\"https://techcrunch.com/transportation/\"> Transportation </a> \n </div> </li> \n <li> \n <div>\n <a>Popular Posts</a>\n </div> \n <div> \n <ul> \n <div></div> \n </ul> \n </div> </li> \n </ul> \n </div> \n </div> \n </div> \n</article>",
"main_length" : 10445,
"main_checksum" : "YjV73cp7OlxVhcej6HxKmT8-6CQ",
"main_format" : "HTML",
"extract" : "<h1>Google Announces Plan To Put Wi-Fi In 400 Train Stations Across India</h1><div>\n Posted \n</div><h4>Medvedev To Hold A Google Hangout On Russia’s Tech Future</h4><p>Today, Google’s CEO Sundar Pichai <a href=\"http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html\">shared details on a new plan</a> to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that aren’t connected.</p><p>The key? India’s train system. And a plan to bring Wi-Fi to its 10 million rail passengers a day. And it’s free (to start). Pichai shared Google’s plans, while sharing his own story about his days using Chennai Central station to get to <a href=\"http://www.iitkgp.ac.in/\">school</a>.</p><p>We’d like to help get these next billion Indians online—so they can access the entire web, and all of its information and opportunity. And not just with any old connection—with fast broadband so they can experience the best of the web. That’s why, today, on the occasion of Indian Prime Minister Narendra Modi’s visit to our U.S. headquarters, and in line with his Digital India initiative, we announced a new project to provide high-speed public Wi-Fi in 400 train stations across India.</p><p>All of the big tech companies have been getting a visit from Indian Prime Minister Narendra Modi, with Facebook being <a href=\"https://techcrunch.com/2015/09/27/modiberg/\">one of them</a>. Each company seems to have its own ideas on how to expand Internet availability and Google’s is definitely unique.</p><p>Here’s a map of the first 100 train stations that will get Wi-Fi by the end of 2016:</p><p>Google will be working with <a href=\"http://www.indianrailways.gov.in/\">Indian Railways</a> and RailTel on the initiative.</p><p>Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:</p><p>Even with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.</p><p>This is the first big initiative for Sundar Pichai as <a href=\"https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/\">the CEO of Google</a>, which will also be holding a <a href=\"https://techcrunch.com/2015/09/18/you-only-have-one-shot/\">major hardware event this week</a>. He noted “It’s my hope that this Wi-Fi project will make all these things a little easier.” This initiative, along with others like the <a href=\"https://techcrunch.com/2014/06/25/android-one/\">Android One</a> project should help the next generations of the residents of India get — and stay — online.</p>",
"extract_length" : 3317,
"extract_checksum" : "JnZAbJDNEyf_5JEHTmPgJnN39qc",
"summary_text" : "Pichai outlined why just 100 stations will speed up the process of getting more of India’s residents on the Internet:\n\nEven with just the first 100 stations online, this project will make Wi-Fi available for the more than 10 million people who pass through every day. This will rank it as the largest public Wi-Fi project in India, and among the largest in the world, by number of potential users. It will also be fast—many times faster than what most people in India have access to today, allowing travelers to stream a high definition video while they’re waiting, research their destination, or download some videos, a book or a new game for the journey ahead. Best of all, the service will be free to start, with the long-term goal of making it self-sustainable to allow for expansion to more stations and other places, with RailTel and more partners, in the future.\n\n",
"title" : "Google Announces Plan To Put Wi-Fi In 400 Train Stations Across India",
"publisher" : "TechCrunch",
"description" : "Today, Google’s CEO Sundar Pichai shared details on a new plan to bring more Indian residents online. He notes that there’s still over a billion of them in his native country that…",
"links" : [ "http://creativecommons.org/licenses/by/2.0/", "http://crunchbase.com/category/blogging-platforms/1ded70549e6d2ede57adc95e83be0c06", "http://crunchbase.com/category/email/687e34ef30e2af08a178436d6b53d633", "http://crunchbase.com/category/information-technology/dbca89faf0835438b4add3fdeceb78e7", "http://crunchbase.com/category/search/0d39287e5b1377a6970274294daded43", "http://crunchbase.com/category/software/c08b5441a05b9777b7a6012728caddd9", "http://crunchbase.com/category/video-streaming/b1b3b2d785ed2cb1fc603e2b6a3b5ddd", "http://crunchbase.com/location/mountain-view/37bfe551197e02269a805d7bec8a50dd", "http://googleblog.blogspot.com/2015/09/bringing-the-internet-to-more-indians.html", "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/#", "http://www.google.com/", "http://www.iitkgp.ac.in/", "http://www.indianrailways.gov.in/", "https://crunchbase.com/organization/google", "https://crunchbase.com/organization/google/", "https://crunchbase.com/person/sundar-pichai", "https://crunchbase.com/person/sundar-pichai/", "https://techcrunch.com/2014/06/25/android-one/", "https://techcrunch.com/2015/08/10/meet-alphabet-googles-new-corporate-boss-as-sundar-pichai-takes-over-the-search-company/", "https://techcrunch.com/2015/09/18/you-only-have-one-shot/", "https://techcrunch.com/2015/09/27/modiberg/", "https://techcrunch.com/advertise/", "https://techcrunch.com/tag/india/", "https://techcrunch.com/tag/sundar-pichai/", "https://techcrunch.com/topic/company/google/", "https://techcrunch.com/transportation/", "https://www.flickr.com/", "https://www.flickr.com/photos/superk550i/2042117809/in/photolist-47sotK-7spnaC-podmnG-gzu6y7-gzu4cR-kQHiMB-8ZUVna-6dMB1r-47snyZ-47wtxQ-47wtCG-47sjFM-ekdEGQ-7adwcX-osi4Np-a3y2iZ-7skxR6-67Lz1x-a3AUXy-9gsZjb-bHdyuP-nWnzdd-gztXDY-oPYzci-gztFed-r2VZYK-gztjRj-s1BxN9-prqnaT-ag9oJ4-8NcLVe-8MRTHi-7Biiee-ag9h1K-6dMBPn-prkeHA-dUFjzm-fydF5p-b3BnjZ-bHdy6K-6dRJZJ-dsjdpp-buiLFy-oLMegv-apTuag-nRccft-nueVin-ecovbQ-6oonig-bVMAE7" ],
"published" : "2015-09-27T19:32:27Z",
"author_name" : "Drew Olanoff",
"author_link" : "http://techcrunch.com/author/drew-olanoff/",
"author_gender" : "MALE",
"image_src" : "https://tctechcrunch2011.files.wordpress.com/2015/09/2042117809_4bf153cf11_b.jpg?w=764&h=400&crop=1",
"card" : "SUMMARY_LARGE_IMAGE",
"type" : "POST",
"sentiment" : "POSITIVE",
"lang" : "en",
"categories" : {
"business" : 6.697868530957982E-6,
"entertainment" : 1.0460869636330763E-7,
"health" : 4.692317650333774E-8,
"politics" : 6.811966674982687E-10,
"science" : 4.314704857471789E-9,
"sports" : 3.8005285377463125E-9,
"technology" : 0.9999931418031668
},
"metadata_score" : 334
}
]
}
The Spinn3r Parser API provides a granular way to request documents and get back parsed metadata around a specific permalink, news article, or blog post.
This provides API access to content on a more granular basis. If a URL is not indexed yet, or its older, or something we might not index; this allows you to still fetch the content and work with it using our content schema and machine learning infrastructure.
When we index the content we perform the following operations:
- fetch the URL content
- language classification
- sentiment text calculation
- chrome (sidebar+navigation) removal
- gender detection
- category classification (tech, politics, science, etc)
- image analysis
Requests
Request POST example:
{
"link" : "http://techcrunch.com/2015/09/27/google-announces-plan-to-put-wi-fi-in-400-train-stations-across-india/",
"publisherType" : "WEBLOG",
"parseStrategy" : "PERMALINK"
}
Use URL endpoint: http://{{vendor}}.rest.spinn3r.com/v1/parser/parse
Requests are JSON and must have the Content-Type: application/json
HTTP header.
POST
is a simple JSON structure with two basic fields:
name | description |
---|---|
link | The URL to fetch and perform metadata extraction |
publisherType | A Spinn3r publisher type. Should be either WEBLOG or MAINSTREAM_NEWS |
Responses
Responses contain a list of objects in the Spinn3r content schema.
name | description |
---|---|
request | Contains metadata about the request that generated this response |
content | A list of objects conforming to the Spinn3r content schema |
Firehose
To get started, first download the latest version of the client
The Firehose client provides 99% of the heavy lifting or connecting to Spinn3r and fetching content in real time.
Quick start
We detail the required steps below but for quick overview:
- Verify that your system meets the requirements
- Download the firehose client
- Provision a firehose connection locally
- Start the fetcher daemon
Requirements
A vendor code provided by Spinn3r.
Linux (or a platform that can run Java. Windows will probably work but we recommend Linux)
Java >= 1.8
20-50 megabit connection to the Internet… Note that this needs to be 2-4x faster than the content you are actually acquiring so that you can resume quickly. You can measure the performance easily by using our speed test
A hard drive with at least 100GB of free space.
You must not run the client in a cron job. We need you to keep it running 24/7 as a daemon.
Download
To download the latest release please visit http://public.maven.spinn3r.com/dist/
We provide both debian packages and tar.gz distributions.
Debian / Ubuntu
We have .debs which can be easily installed on Debian and Ubuntu. You will need a Java 8 release in the PATH but there are no other significant requirements.
For .deb packages, this will allow you to startup the daemon (after you’ve provisioned your client) to setup as a regular unix daemon. IE:
/etc/init.d/spinn3r-artemis-client-fetcher start
tar.gz binaries
If you’re on Redhat, Solaris, or another OS, we provide binary packages that you can use which are packages in tar.gz format.
For tar.gz, once uncompressed you will see a filesystem layout like the following:
drwxr-xr-x 5 burton 170 Sep 6 16:13 .
drwxr-xr-x 3 burton 102 Sep 6 16:13 ./etc
drwxr-xr-x 3 burton 102 Sep 6 16:13 ./etc/init.d
-rwxr-xr-x 1 burton 2824 Sep 6 16:10 ./etc/init.d/spinn3r-artemis-client-fetcher
-rwxr-xr-x 1 burton 2824 Sep 6 16:10 ./install.sh
-rw-r--r-- 1 burton 259 Aug 22 16:58 ./README.md
drwxr-xr-x 3 burton 102 Sep 6 16:13 ./usr
drwxr-xr-x 3 burton 102 Sep 6 16:13 ./usr/share
drwxr-xr-x 3 burton 102 Sep 6 16:13 ./usr/share/spinn3r-artemis-client-fetcher
drwxr-xr-x 15 burton 510 Sep 6 16:13 ./usr/share/spinn3r-artemis-client-fetcher/lib
-rw-r--r-- 1 burton 6260 Sep 6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-http-lib-5.1.21.jar
-rw-r--r-- 1 burton 8833 Sep 6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-schema-core-5.1.21.jar
-rw-r--r-- 1 burton 34065 Sep 6 16:09 ./usr/share/spinn3r-artemis-client-fetcher/lib/artemis-util-5.1.21.jar
-rw-r--r-- 1 burton 185140 Jun 18 20:50 ./usr/share/spinn3r-artemis-client-fetcher/lib/commons-io-2.4.jar
-rw-r--r-- 1 burton 2172168 Jun 30 20:15 ./usr/share/spinn3r-artemis-client-fetcher/lib/guava-15.0.jar
-rw-r--r-- 1 burton 35058 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-annotations-2.3.0.jar
-rw-r--r-- 1 burton 197981 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-core-2.3.0.jar
-rw-r--r-- 1 burton 914028 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/jackson-databind-2.3.0.jar
-rw-r--r-- 1 burton 481535 Jun 16 22:57 ./usr/share/spinn3r-artemis-client-fetcher/lib/log4j-1.2.16.jar
-rw-r--r-- 1 burton 26768 Sep 6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-api-5.1.21.jar
-rw-r--r-- 1 burton 3457 Sep 6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-core-5.1.21.jar
-rw-r--r-- 1 burton 26821 Sep 6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-fetcher-5.1.21.jar
-rw-r--r-- 1 burton 14740 Sep 6 16:10 ./usr/share/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-lib-5.1.21.jar
At this point you can just run
install.sh
which will place the files into the right directories in your OS.
You can then move on to provisioning to create a new firehose connection.
Usage
The client first needs to be provisioned. The provisioning step defines various parameters needed for indexing content and then creates the directories on the filesystem needed to run the client.
There are then two main daemons that need to be started. One fetches content from the server and spools it to disk (the fetcher) . The other watches for new content on disk and parses it and imports the content into your database.
Provision
Provision command:
java -cp "/usr/share/spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Provisioner \
--dir=/var/spool/spinn3r-artemis-client/default \
--vendor={{vendor}} \
--after=-1hour \
--processPolicy=DELETE \
--fetchListenerClassName=com.spinn3r.artemis.client.watcher.LoggingFetcherListener
You first need to provision a new client in a given directory. The client will write a resume.checkpoint file so that you can start/stop the client and it will automatically resume.
NOTE: Make sure the quotes are included in the Java classpath or the command won’t run due to bash file name expansion.
Arguments
The following arguments must be specified:
dir
The directory which will contain your spool and where the client will save your files.
Use
As your main/default spool directory.
You can create additional directories for custom filters or back filing data if
you fall behind.
#### after
The 'after' parameter accepts both absolute time (if specified in ISO8106) as well
as relative time.
Relative time is in the format:
-1hour -2hours -1day +1hour +1day “`
The - (negative) prefix is used to denote time in the past.
The + (positive) prefix is used to denote time in the future.
If you would like to start from the current moment in time you could specify: +0minutes
value | meaning |
---|---|
+0minutes | Start indexing from now |
-1hours | Start indexing one hour in the past |
-2hours | Start indexing two hours in the past |
-30minutes | Start indexing 30 minutes in the past |
processPolicy
Specify what the Watcher daemon should do with files once they have been processed.
This only applies when you’re using Java.
name | description |
---|---|
DELETE | Delete the files when they are processed. The advantage of deleting files is that it’s easy to free up disk space. The disadvantage is that you can’t re-process them if something goes wrong. |
MOVE_TO_PROCESSED | Move the files to a ‘processed’ directory. You then need to have a cron script or some other process manage purging files once you think they’re ok to delete. This has the advantage of being able to re-import files but the disadvantage of potentially running out of disk space. |
filter
If you’d like to run with a client-side filter you can add a --filter
option when you provision the client and the filter will be applied locally.
We support both client and server side filters and the server side filter will be used with both the client side filter for all requests.
The server-side filter is deployed with your Spinn3r account based on what data you’re purchasing.
See the filtering documentation on how to setup a filter with and some examples of using the language.
If you have trouble writing one please just contact support and we’ll write one for you.
Spool directory contents
After if you 'ls’ the directory you will see:
root@my-host:/usr/share/spinn3r-artemis-client# ls -al /var/spool/spinn3r-artemis-client/default
drwxr-xr-x 8 root root 4096 Sep 6 21:04 .
drwxr-xr-x 3 root root 4096 Sep 6 21:04 ..
drwxr-xr-x 2 root root 4096 Sep 6 21:04 data
drwxr-xr-x 2 root root 4096 Sep 6 21:04 dead
drwxr-xr-x 2 root root 4096 Sep 6 21:04 lib
drwxr-xr-x 2 root root 4096 Sep 6 21:04 logs
drwxr-xr-x 2 root root 4096 Sep 6 21:04 processed
-rw-r--r-- 1 root root 298 Sep 6 21:04 resume.checkpoint
drwxr-xr-x 2 root root 4096 Sep 6 21:04 tmp
The data directory stores all the json files that the client fetches.
The logs directory contains the logs of the client running in the background.
Run the Fetcher
Now just run the client.
/etc/init.d/spinn3r-artemis-client-fetcher start
This will startup the fetcher and handle the spool directory you just created
Indexing
After fetching content from Spinn3r, files are spooled to disk which allows you to parse them and import the data into your database.
If you’re using Python or another language you can parse the files directly.
File format
Example header:
// START-META
// request-URL: http://api.artemis.spinn3r.com/api/v1/content.stream...
// END-META
All files are formatted as JSON with UTF8 encoding.
The only exception to standardized JSON are that files have a header that may not be handled by some libraries since this is not part of the JSON spec.
This is easy to strip if your JSON parser doesn’t support comments. Just strip
until // END-META
.
This is provided so that you can re-download the URL should you find that it is corrupt.
Designing your parser
Here are some rules to design a parser that handles changes to our API moving forward.
The format will always be UTF8.
The JSON output will never change to another format unless it’s another endpoint.
You MUST handle additional fields. Just ignore them until you have a chance to review the documentation and see how they can benefit your application.
You MUST NOT rely on the output format being pretty printed. In general it’s best to use a real JSON parser like Jackson so that any idiosyncrasies in parsing are transparently handled.
Potential Issues and Warnings
- The client may write the same content twice to the local spool. This is very rare but if your client crashes it won’t have written a recent checkpoint and it’s going to re-fetch the same content. Your database should use a primary key to prevent this from happening. You can use our 'sequence’ value in the stream as a primary key as that’s gauranteed to always be unique on our end.
Best Practices
In order for us to provide production-level and high quality service to our customers, and for Spinn3r to meet our SLA, we require our customers to meet our best practices for using our APIs.
These ensure that service levels are maintained and that Spinn3r can provide high quality service.
Additionally, we’ve found common anti-patterns in customer configuration and setups that we strongly want to avoid.
Required version upgrades
VERY rarely we may notify customers that a version upgrade is required.
In nearly ten years of business we’ve done this exactly once.
Once a customer has been notified that a mandatory version upgrade is required they MUST upgrade to this version in order to meet our SLA.
Minimum required bandwidth
A Spinn3r bandwidth test much be done periodically (every 60 days at least) to ensure at least 2x throughput is available to our servers.
In practice this means you need about 30Mbit to fetch data.
The additional bandwidth is required for resume. Without resume support if your client goes offline it won’t be able to ever catch up if we don’t have more capacity than required.
Spool directory on NTFS or HDFS
Due to added latencies on IO, we require customers to write directly to a regular HDD or SSD drive connected to a local computer.
We understand that many of our customers want to work with their own database, filesystem, or queueing technology, but these have consistently added extra latencies and proven to slow down our firehose client.
These can be used in production easily by moving files into your database, filesystem, etc as soon as they are written to disk.
HTTP proxies
All HTTP proxies must be standard RFC compliant proxies.
They MAY cache documents but MUST only cache documents with proper HTTP caching headers. They MUST NOT cache documents based on their own policies.
They MUST NOT block requests which are legitimate.
Examples of broken proxy configurations include malware, content censorship proxies that detect patterns of content and block them.
DNS
All DNS servers must be standard RFC DNS servers.
They MAY cache DNS requests but only according to proper DNS caching TTLs.
They MUST NOT create arbitrary caching polices.
Specifically, we use low DNS cache TTLs and your DNS recursors must respect this or it may lead to an outage.
Intrusion Detection Systems
We strongly recommend that intrusion detections systems (IDS) be disabled when using Spinn3r or that they at least only warn on issues, not disable them.
Because of the fact that Spinn3r uses a large number of HTTP requests, especially in in the Firehose API, an IDS can block Spinn3r and take break production applications.
This is especially true if it’s performing any type of content analysis as some Internet content may be controversial and flagged as being inappropriate by the IDS thus breaking Spinn3r.
Architecture
The client works by reading new data via HTTP, encoded as JSON, and writes the data to a local directory you specify via the command line.
Your application just needs to watch the directory for new files, parse them, and import them into your database.
Custom clients not supported
Custom clients that connect to Spinn3r are NOT supported and we consider it an anti-pattern that will eventually break. Unfortunately, there are far too many issues with implementing HTTP correctly that will eventually cause custom clients to fail: HTTP connect and read timeout, gzip encoding, DNS load balancing and caching, resume, HTTP headers and encoding all amount to a very complicated implementation which we don’t to break for our customers. Further, this allows us to push out clients with new features without asking our customers to implement them.
Protocol Overview
The raw protocol works by fetching content from a given date range.
However, we hide all of this from you and provide you with a simple daemon you just run on your servers.
Message Integrity
All JSON messages represented, as HTTP responses, are given a SHA256 checksum during the response which the client then verifies.
With high volume throughput it’s possible (but rare) that messages could become corrupt during transfer and this way we verify that nothing is corrupted during transfer.
Filtering
We support an easy to use filtering language that can be used to mine data from the stream of events that we publish.
The language is designed for filter messages as they pass through the stream.
Language
First, see the content schema definition for documentation on all fields.
The language is very similar to Java/python conditional statements and is modeled after XPath.
Boolean logic and grouping
Arbitrary boolean logic and grouping is supported:
foo = 'foo' and ( cat = 'cat' or dog = 'dog' )
Contains
The contains() function can be used to see if a field contains a substring. It can also to be used for set membership.
contains( title , 'Linux' )
Filter for a specific language code
This will filter out all objects except those with this exact title.
lang = 'en'
Filtering a specific source
Filter all source except Techcrunch:
source_resource = 'http://techcrunch.com'
Filtering one publisher types
You can filter for one specific publisher type:
source_publisher_type = 'MAINSTREAM_NEWS'
Filtering for more than one publisher type
If you would like more than one publisher type in the stream, you can specify it with a logical 'or’:
source_publisher_type='MAINSTREAM_NEWS' or source_publisher_type='WEBLOG'
Logging and Monitoring
Both the Fetcher and Watcher have support for logging messages via log4j.
Standard input / output
The standard input and output of the daemon in written to /var/log/{{daemon_name}}
The fetcher writes to:
/var/log/spinn3r-artemis-client-fetcher
These may have critical exceptions if the daemons fail to startup properly.
Runtime logs
Once running, all logs are written to the logs directory under your spool dir.
We log both the progress of the client as well as any HTTP requests being performed, their URLs, as well as any exceptions being thrown.
Monitoring
Both daemons open a port using HTTP which can be used to monitor the state of the daemon.
For security reasons, the daemon only binds to localhost (127.0.0.1) so that these ports are not accessible to the world.
Ports
The fetcher runs on port 20400
URLs to monitor
Each daemon has the following endpoints:
/threads
Dumps all threads and their associated stack trace. Can be used to detect a dead fetcher and probably only useful when requested by Spinn3r support.
/ping
Returns with the phrase 'pong’. Useful for monitoring that the process is live.
Checkpoints
Here’s an example resume.checkpoint file:
{
"after" : 1410109500000000000,
"before" : 9223372036854775807,
"vendor" : "{{vendor}}",
"comment" : "Spinn3r checkpoint file. You MUST shut down the fetcher and watcher before editing (and create a backup first).",
"processPolicy" : "DELETE",
"fetchListenerClassName" : "com.spinn3r.artemis.client.watcher.LoggingFetcherListener"
}
Periodically the fetcher will write a new checkpoint file to disk keeping track of its progress. This way when it’s restarted it doesn’t have to download all identical content again.
The file can be edited but you MUST stop the fetcher first.
This will allow you to change various config directives including the process policy, etc.
Proxy server
First, make sure you absolutely need to run a HTTP proxy. It’s an additional piece of middleware and could cause a problem interacting with Spinn3r.
HTTP proxies are a somewhat stable piece of technology so for most uses it should work just fine.
If you would like to use the fetcher with a HTTP server you need to perform the following steps.
Update init.d 'default’ file
Edit the file:
/etc/default/spinn3r-artemis-client-fetcher
This file is loaded from the daemon startup script and allows you to add extra command line arguments to your daemon.
you will need to add the line:
export JAVA_OPTS="-Dhttp.proxyHost=localhost -Dhttp.proxyPort=8080"
The host and port will need to be changed to the host and port in your environment.
Then just restart the daemon. The daemon will then be using the HTTP port.
Verify that it’s working
Then restart the daemon by running:
/etc/init.d/spinn3r-artemis-client-fetcher restart
On startup you will see the following message if it correctly loaded the default file:
Loading init.d default file: /etc/default/spinn3r-artemis-client-fetcher
Then you should run ps aux
and look for the daemon and verify that the
above options were added to the daemon command line.
At that point you should be using a proxy server with the fetcher and all requests will go through the proxy.
Data Directory
Some users require the data directory to be placed outside of /var
. This
can be used for various reasons including usage on different volumes etc.
This can be done via editing the daemon defaults
file
Changing the defaults file
Create a file named
/etc/default/spinn3r-artemis-client-fetcher
Then add:
export EXTRA_OPTS="--basedir=/path/to/new/basedir"
This will tell the daemon to read spool directories from this base directory.
You should have a default
spool directory provisioned here which is your
main connection to the Spinn3r data.
Firehose archives
By default all firehose customers have access to the last 5 days worth of content.
It’s your responsibility to keep a client up and running and listening to data.
Do not run a client via crontab. Instead. Keep it running in the background as a daemon.
If you require access to data older than 5 days you can contact us to purchase access to that data via full-text search and fetch all the content in the time window you require.
Standalone mode
The Spinn3r client can be installed in a standalone mode where it doesn’t have to be run as a daemon and instead can be run by command line.
This is not the preferred method to install the client and is only really a good idea if you’re running in a non-standard or custom environment or on an OS that supports Java but might not necessarily be Linux/Unix (Windows or Mac OSX).
Provisioning
java -cp "./spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Provisioner \
--dir=./spool/spinn3r-artemis-client/default \
--vendor={{vendor}} \
--after=-1hour \
--processPolicy=DELETE \
--fetchListenerClassName=com.spinn3r.artemis.client.watcher.LoggingFetcherListener
The provision commands are all the same. The main difference is that you’re going to want to specify a directory other than the default.
(please see the provisioning documentation for more information on how to provision the client)
This assumes that the spinn3r-artemis-client-fetcher
directory is in
your current directory and that your spool
should be in your current
directory as well.
These can be placed anywhere you like. We would advise AGAINST placing
this on a network mounted share. If it fails at ANY future point it means
your Spinn3r client will also fail. The best strategy here is to write local
and then have a background find
command copy the files to your network
share (probably via crontab).
Starting the fetcher
java -cp "./spinn3r-artemis-client-fetcher/lib/*" com.spinn3r.artemis.client.fetcher.Fetcher --basedir=./spool/spinn3r-artemis-client
This will output the following:
Loading log4j config from: jar:file:/Users/burton/test/spinn3r-artemis-client-fetcher/lib/spinn3r-artemis-client-fetcher-5.1.292.jar!/config/log4j.xml
Starting HTTP server on port 20400 (useLocalhost=true)...
For more information about Spinn3r please see:
http://spinn3r.com
And for instructions on how to use the client please see:
1
https://github.com/burtonator/spinn3r-artemis-client-example
Now all we need to do is run a fetcher which points to our newly provisioned directory.
The daemon will start in the foreground and block. It won’t return control to the terminal. You will either have to place it in screen or some sort of job control system to keep it running.
Common Problems
There are many common problems that our customers run into that we want to highlight:
Speed test.
If you’re having any problems with the client locking up or lagging make sure to run a speed test
Nine times out of ten the problem is bandwidth. Even if you think you have enough bandwidth, make sure to run a speed test.
WAN links, traffic shaping, proxies, and firewalls can sometimes catch our client as we push a lot of bandwidth through the API.
MICROBLOG and text vs HTML.
Microblog content comes through the API as text. This may seem like just HTML
but we have a main_format
element which is a symbol for either TEXT or HTML
which you can use to differentiate between the two.
POSTs vs COMMENTs
Each piece of content has a type
associated with it. Make sure to process
this correctly as you may wish to handle comments and posts in different manners.
Migration
If you’re a Spinn3r 4.0 customer, and are migrating an existing client, you may need to migrate on a field by field basis to the new Spinn3r 5.0 JSON format.
The full JSON mapping is kept in our content schema.
The following table will help you map from existing fields to new fields:
Mapping
Protostream
Protostream files are generated by Spinn3r 4.0. They were our preferred file format but we may have some older customers using XML files. The mapping from XML to protostream is located in the Spinn3r 4.0 wiki:
Spinn3r 5.0 JSON
The Spinn3r 5.0 JSON format is designed to be concise and human readable as well as have reasonable parsing performance.
Fields may have a prefix. For example, source_title
is the title
for
source
object.
For the most part, there’s a 1:1 mapping from protostream to Spinn3r 5.0 JSON.
In some cases, there are additional fields for 5.0, for example, we now support partial dates (dates without timezones, or seconds).
protostream | JSON |
---|---|
permalink.title | title |
permalink.hashcode | hashcode |
permalink.author.name | author_name |
permalink.author.email | n/a. This was almost never included due to people protecting their identity and email from spammers. |
permalink.author.link.href | author_link |
permalink.spam_probability | n/a. will be added in the future. Seldom used. |
permalink.last_published | last_published |
permalink.date_found | date_found |
permalink.link.href | link |
permalink.link.resource | resource |
permalink.lang.code | lang |
permalink.content.data | html |
permalink.content_extract.data | main |
source.link.href | source_link |
source.link.resource | source_resource |
source.title | source_title |
source.description | source_description |
source.hashcode | source_hashcode |
source.lang.code | n/a for now. This field wasn’t extensively used in 4.0. However, each piece of content has it’s own language code though. We may add this in the future as a histograph of the language posted to the source from historical content. |
source.lang.probability | n/a for now. |
source.date_found | source_date_found |
source.resource_status | n/a |
source.last_posted | source_last_posted |
source.tier | n/a for now. This wasn’t a widely used field. |
source.publisher_type | source_publisher_type |
Schema
content schema
A basic example of data in JSON format.
{
"bucket" : 0,
"resource" : "http://cnn.com/2014/10/15/health/texas-ebola-outbreak",
"date_found" : "2014-06-22T01:08:52Z",
"index_method" : "PERMALINK_TASK",
"html" : "<html><body><p>Full HTML of the content</p></body></html>",
"html_length" : 57,
"source_hashcode" : "COH0cFU4G1sMlRHd9gEvS-n3FFI",
"source_resource" : "http://cnn.com",
"source_link" : "http://cnn.com",
"source_publisher_type" : "MAINSTREAM_NEWS",
"source_publisher_subtype" : "MAINSTREAM_NEWS",
"source_date_found" : "2014-06-22T01:08:52Z",
"source_update_interval" : 900000,
"source_setting_update_strategy" : "CYCLICAL",
"source_setting_index_strategy" : "DEFAULT",
"source_title" : "CNN",
"permalink" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
"canonical" : "http://www.cnn.com/2014/10/15/health/texas-ebola-outbreak/index.html",
"main" : "<p>Full HTML of the content</p>",
"main_length" : 31,
"main_checksum" : "I7QyvW_g9AjGg3vWjmcxwo7wjXs",
"main_format" : "HTML",
"summary_text" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
"title" : "CDC: Nurse with Ebola should not have traveled",
"publisher" : "CNN",
"section" : "Technology",
"description" : "Another nurse who contracted Ebola after caring for a man who died of the virus was on a flight from Cleveland to Dallas.",
"tags" : [ "nurse", "outbreak", "ebola" ],
"published" : "2014-06-22T01:08:52Z",
"author_name" : "Holly Yan",
"author_link" : "https://twitter.com/HollyYanCNN",
"lang" : "en"
}
Stores content in our index including the full HTML as well as the metadata we were able to extract. Some of these fields are HTML which are cleaned of any unsafe elements which might cause cross scripting attacks or other vulnerabilities are removed. Additionally, All URLs are fully expanded. Encoding is UTF-8. We may reference external objects, such as the Source metadata which is then fully denormalized with a source_ prefix.
Member Name | Type | Description |
---|---|---|
The bucket to write this content (timestamp prefix and suffix valued from 0-99). This allows us to use the random partitioner and still get decent parallel client read performance. | ||
The time our robot saw the post and wrote it to the database. This is a sequence timestamp and supports distributed write. This can be used as an external primary key as it’s gauranteed to always be unique. The value is opaque and not designed to be readable by humans and the format can change at any time. | ||
The sequence as a range of values between 0 and 999,999 (sequence % 100000). This allows you to filter values by range to accept just a sample of values. | ||
base64filesafe(sha1(resource)) … Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url | ||
Tokenized form of the permalink. | ||
The time we fetched and added this content to our index. | ||
The method that we used to discovery and index the content. We have various algorithms to discover content and this lets the algorithm tag the content. | ||
The method we used to detect this URL was new and recently published. | ||
The HTML content of this permalink as fetched by our robot. Note that this is RAW content. No cleanup is done. Javascript is present, etc. If you want to work with this content you must make sure to clean/sanitize it yourself. See the ‘body’ field for a clean version of the document. In some situations it’s possible to not have any html. An example is when we’re using an API or firehose where the original full-html isn’t present or not would just be wasteful. | ||
The length of the HTML | ||
The SHA1 checksum of the HTML. | ||
zlib compressed HTML content from our crawler. Used for legacy customers who need full HTML. | ||
The length of the HTML | ||
The SHA1 checksum of the HTML. | ||
${member.description} | ||
The version of Spinn3r used to write this content. | ||
The last time we updated the metadata on this content. On initial record creation last_updated and date_found will be identical but last_updated will change over time as we update metadata. | ||
base64filesafe(sha1(resource)) of the source. Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url of the source. | ||
The tokenized URL for this source. | ||
The non-tokenized URL for this source. Use this URL if you would like to fetch this source via HTTP. | ||
The publisher type (mainstream news, weblog, forum, etc) of this source encoded as an int. | ||
A string representing the publisher sub type which is more specific than the publisher type. The publisher subtype is usually the name of the social network hosting the content. | ||
The time we added this source to our index. This is the time we found the source not when it was created. | ||
The last time our crawler visited the source and processed it with a task. This is always incremented even if the site isn’t updated or even if the site is HTTP 500 or other network/transient errors. This may not be updated if we aren’t fetching the source via HTTP. | ||
The last time this source published a new HTML file (as measured by content_sha1). This may not be updated if we aren’t fetching the source via HTTP. | ||
The last time this source posted a new piece of content | ||
The number of milliseconds between updates to re-fetch this source. This is used to for cyclical updates of sources and usually depends on how often the source posts updates. | ||
The HTTP status code of the last request to this source. | ||
The probability, between 0 and 1, that this source is a spam source. -1.0 if we have not yet classified it. | ||
The length, in bytes, of this HTML from the last time we fetched the page. | ||
The SHA1 checksum of the content. | ||
The set of tags assigned to this source by the either customers or spinn3r (globally). This is used so that your client can filter by assigned tags or search by them as well. This is not to be confused with the tags field which are assigned by the site. These tags are opaque strings and not human readable to avoid giving away any customer information in the API. Any sources you manually register are assigned tags with your vendor auth code. This will allow you to register sources, and then filter / search over them. | ||
The update strategy for computing the update interval. | ||
The update stratey for computing the update interval. | ||
Policy on handling author metadata. | ||
The PSHB hub this source is using. | ||
The PSHB topic this source is using. | ||
The last time this source posted and sent a message to the PSHB endpoint. | ||
The time this PSHB lease expires. | ||
The number of user interactions from other sources on this social network computed from the graph as we index content. This is periodically computed and loaded into our source index. This could be the number of at mentions, comment replies, etc. | ||
The minimum metadata score before we can persist content | ||
The next time we’ve scheduled the source to update | ||
The title of the source. | ||
A short description of the source. | ||
Unique handle for this source across the entire social media property. | ||
The number of favorites this source has according to the website or social network. | ||
The number of followers this source has according to the website or social network. | ||
The number of users / friends this source is following. | ||
True when this user account is verified to be authentic. | ||
Set of URLs on other social networking sites and weblogs for this user. These are essentially alternate profiles for the user. Their twitter site, facebook site, etc. | ||
The human readable location of the source. Example: 'Washington, DC’ | ||
The URL to the img which represents this source. | ||
The width of the image. | ||
The height of the image. | ||
The telephone number for this source. Only present in limited situations. Specifically around REVIEW sites. | ||
Tags for the source provided by the user’s profile. | ||
The rating for this item provided by the user. | ||
The URL to the favicon which represents this source. | ||
The width of the favicon. | ||
The height of the favicon. | ||
The time this account was created and is provided from the source. | ||
The number of Facebook likes for this source. | ||
A set of tags, optionally assigned by a site, which relate to this specific source. Supported for medium.com only (for now) | ||
The number of posts parsed/found when we last indexed this source. | ||
The maximum number of parsed posts we’ve ever seen. If parsed_posts_max greater than zero and parsed_posts is 0 then we are probably hitting a throttle or failing to parse the content. | ||
The URL of the RSS feed. | ||
The title of the feed. | ||
The format of the feed as a token. RSS or ATOM, etc. | ||
The unique URL to the content. | ||
A platform specific unique identifier for this post. Note that this is NOT always present as some platforms lack the concept of unique identifiers. Additionally, this may conflict with another identifier from another platform. | ||
Same as permalink but if the site performs a 301 or 302 redirect this is the URL we were redirected to. | ||
The domain for the permalink_redirect. Identical in semantics to the domain field. | ||
The site for the permalink_redirect. The full hostname. For example, www.cnn.com, alice.blogspot.com, etc. | ||
The primary link to the content. The vast majority of the time, this is identical to permalink. However, some publisher types (MEMETRACKER) have a different link to the content which is external to the site. If the link is NOT the same as the permalink, then we include it in the links field for search and accuracy purposes. | ||
The domain for the link. Identical in semantics to the domain field. | ||
The site for the link. The full hostname. For example, www.cnn.com, alice.blogspot.com, etc. | ||
The shortlink URL, if known. This is the prefered 'short’ URL discovered from either the content itself or through metdata. | ||
The canonical URL to the content (as specified by the publisher) in rel=canonical (and other specs such as og:url). | ||
The domain name of the permalink. blogspot.com, example,com, etc. | ||
The site of the permalink including the full host name. www.cnn.com would be a site and cnn.com would be a domain. | ||
The actual main content of the article. The authoritative 'main’ of the post derived by removing sidebar content. (html). This content is sanitized, cleaned so that javascript, event handlers, etc are removed. This is analagous to the HTML5 main element. IE the main content of the page, with no header, footer, or sidebar content. | ||
The length of the main field, in bytes. | ||
The checksum of the main field. | ||
True when the main content is 100% accurate and the extract is not needed. | ||
The format of the main element (either HTML or text) | ||
The extract of the content with applied chrome/boilerpipe removal algorithms applied. | ||
The length of the extract field, in bytes. | ||
The checksum of the extract field. | ||
A summary of the document computed by our document summarizer. This summary is in plain text. If mulitiple paragraphs are present they are separated by a newline. If you would like to separate the paragraphs in your UI and you’re rendering HTML you can split the summary text by newline and wrap each paragraph in a P element. | ||
The title of the post. | ||
The publisher name. (CNN, MSNBC, Techcrunch, etc) | ||
Articles may belong to one or more 'sections’ in a magazine or newspaper, such as Sports, Lifestyle, etc. | ||
A short description of the item (HTML) | ||
Tags for the item. | ||
Username mentions for users within the content of this post. | ||
All outbound links in the main element. Since main is the authoritative content, without chrome or sidebar content, this can be used for ranking purposes. | ||
Date of first broadcast/publication. | ||
The date on which the content was most recently modified. | ||
This is identical to published except it’s a partial value. If an exact date is found we both fields are populated but if we only have a partial date then we only specify this field. The value is ISO8601. For example, 2014-01-01. |
||
This is identical to modified except it’s a partial value. If an exact date is found we both fields are populated but if we only have a partial date then we only specify this field. The value is ISO8601. For example, 2014-01-01. |
||
The name of the author. This is the human readable name like 'Barack Obama’ or 'Michael Jordan’ | ||
The link for the author. | ||
The handle of the author. This is a unique token/handle for the author across the whole site. For example 'barackobama’ and would never conflict with another account. | ||
The number of followers for this author. | ||
The location for this author. | ||
The URL to the img which is an avatar for the user who posted this content. | ||
The width of the avatar img. | ||
The height of the avatar img. | ||
User ID in the target platform (when available) | ||
When present, the gender of the author. | ||
The human readable location of the source. Example: 'Washington, DC’ | ||
The location identifier (if available) for this location. This is platform specific. | ||
Name of the feature we’re representing. | ||
A point contains a single latitude-longitude pair, separated by whitespace. | ||
A bounding box is a rectangular region, often used to define the extents of a map or a rough area of interest. A box contains two space seperate latitude-longitude pairs, with each pair separated by whitespace. The first pair is the lower corner, the second is the upper corner. | ||
Id in geonames database. | ||
The human readable location including its parent locations | ||
The human readable country derived from geo_location. These are represented as ISO 3166-1 alpha-2: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2 | ||
The human readable state derived from geo_location. | ||
The human readable city derived from geo_location. | ||
Contains the name of the field used to parse the geo data | ||
The rating for this item provided by the user. | ||
The URL to the favicon which represents this source. | ||
The width of the favicon. | ||
The height of the favicon. | ||
The URL to the img which represents this content. | ||
The width of the image. | ||
The height of the image. | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
One of the images URL representing this content | ||
The width of the image | ||
The height of image | ||
True when this source was not published by the original user but actually shared from someone the source follows. On microblogging platforms this is a retweet. On others it’s a shared post. | ||
The type of shared content. | ||
Deprecated: See shared_author_link | ||
Deprecated: See shared_author_name | ||
The link to the profile of the person who originally posted this story. | ||
The title of the profile of the person who originally posted this story. | ||
User ID in the target platform (when available) | ||
A platform specific unique identifier for this post. | ||
The unique URL to the content. | ||
The handle of the author. This is a unique token/handle for the author across the whole site. For example 'barackobama’ and would never conflict with another account. | ||
The URL to the img which is an avatar for the user who originally posted this content. | ||
True when this source was a reply, false otherwhise | ||
The link to the profile of the person being replied to. | ||
The title of the profile of the person being replied to. | ||
When present, the type of card that can be used to display this content within web applications | ||
The URL to an iframe which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of the player iframe. | ||
The height of the player iframe. | ||
The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
SearchQueryRequestFrontendServiceTestThe URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
The URL to one of the iframes which can be embedded to play this video. HTTPS URL to iframe player. This must be a HTTPS URL which does not generate active mixed content warnings in a web browser | ||
The width of one of the players iframe. | ||
The height of one of the players iframe. | ||
The type of this content as either a POST or a COMMENT. This allows us to index posts and comments through the same API. | ||
The overall sentiment for this content | ||
ISO language code for this source. All our language codes are ISO 639 two letter lang codes. We use the special lang code of U when we are unable to determine the language from the underlying text - usually because we don’t have enough data. | ||
Provides a map between algorithmically determined categories (entertainment, politics, technology, science, sports, business, health) and their probabilities. The probabilities are between 0.0 and 1.0 and if you sum them all they will equal 1.0. | ||
Provides data on previously posted documents which are duplicates of this document. Keys are sequence values for the documents and the is a double between 0.0 and 1.0 where 0.0 is no duplication and 1.0 is full duplication | ||
The total number of duplicates. | ||
Provides a map between algorithmically determined classifications driven by customers. The keys are keys given to customers identify their classification and the value is the probability of that classification. The values DO NOT sum to 1.0 as there may be multiple classifications here. | ||
See content.hashcode | ||
See content.permalink | ||
See content.title | ||
See content.lang | ||
See content.resource | ||
The number of likes for this post (when we first find it). | ||
The number of dislikes for this post (when we first find it). | ||
The number of comments for this post (when we first find it). | ||
The number of views for this post (when we first find it). | ||
This only applies for video platforms. The description of how many time this video was watch. Note that this field DOES NOT update dynamically. | ||
This only applies for video platforms. How many subscriptions to the channel where done from this video. Note that this field DOES NOT update dynamically. | ||
The quality of the metadata on this post. Used internally to audit the quality of Spinn3r data. Not very applicable to customer use. | ||
The number of shares for this post. For some microblogging platforms this could be a rewtweet but for others its a share. Most platforms have this concept. | ||
The number of updates to metadata we have | ||
True when when the user has pinned this content to their profile effectively locking the post in place. | ||
Users tagged in the image and their coordinates within the image. Coordinates are expressed as a factor of the width and height using 0,0 as the top left corner, this is, in a 100x100 px image the position 0.23 , 0.55 is at 23 px from the left and 55 px from the top. This field is only valid for image social data such as Instagram or Facebook |
Enum index_method
This is an enum type for index_method. The following values are accepted:
Enum Name | Description |
---|---|
PERMALINK_TASK | Content indexed by the permalink task. |
SOURCE_TASK | Content indexed by the source task. |
PSHB | Content indexed by pubsubhubbub hub push. |
SOURCE_TASK_COMPOSITE | Content indexed by the source task for a composite post. |
FEED_TASK | Content indexed by the feed task. |
TWITTER_TASK | Content indexed by the twitter task. |
Enum detection_method
This is an enum type for detection_method. The following values are accepted:
Enum Name | Description |
---|---|
SOURCE | Found via the source. |
FEED | Found via the feed. |
Enum source_publisher_type
This is an enum type for source_publisher_type. The following values are accepted:
Enum Name | Description |
---|---|
UNKNOWN | Unknown publisher type. |
WEBLOG | Weblog. Defined as a smaller site, usually owned by an individual. |
MAINSTREAM_NEWS | Mainstream news source. Generally owned by a corporation with multiple paid writers. |
CLASSIFIED | Classified site. Craigslist, Backpage, etc. |
FORUM | Forum sites like phpBB, phorum, vbulletin, etc |
REVIEW | Review site. Like epinions, amazon reviews, etc. |
MEMETRACKER | Memetracker like reddit, digg, techmeme, google news, etc |
MICROBLOG | Microblog content such as Twitter, identi.ca, etc. |
SOCIAL_MEDIA | Social media sites (facebook, instagram, etc). |
VIDEO | Video hosting site like Youtube, Vimeo, etc. |
PHOTO | Photo sharing site like instagram, flickr, etc. |
Enum source_setting_update_strategy
This is an enum type for source_setting_update_strategy. The following values are accepted:
Enum Name | Description |
---|---|
CYCLICAL | Default update strategy. Essentially just update the source at a regular rate. |
ADAPTIVE | Adapt the update interval based on the posting frequency of the source. This way we update sources less frequently if they post once per month compared to sources that update once per hour. |
PUSH | The source content is pushed to us directly via push (pshb) |
SEARCH | The source is updated only from a search feed. It’s not updated directly but a parent source updates it. |
PING | We receive a ping via an external update mechanism. This is a notice that the blog has been updated. Then we launch a task to fetch the content. |
INDIRECT | This source is NOT updated directly but rather is updated indirectly via another source. Usually a search feed. |
FEED | The source is updated via an RSS / Atom feed. We don’t index it directly. |
NONE | The source is not updated. |
SCHEDULED | This source is using the scheduled setting based on cron. |
Enum source_setting_index_strategy
This is an enum type for source_setting_index_strategy. The following values are accepted:
Enum Name | Description |
---|---|
DEFAULT | Default index strategy. Just a normal source. No special or fancy strategy. |
SEARCH_KEYWORD | This source is a search source driven by keywords. |
SEARCH_USERNAME | This source is a search source driven by ranked usernames. |
SEARCH_KEYWORD_AND_CITY | This source is a search source driven by keywords but we include the city as well. |
COVERING_SET | This source is indexed via a covering set of subscriptions. |
Enum source_setting_author_policy
This is an enum type for source_setting_author_policy. The following values are accepted:
Enum Name | Description |
---|---|
DEFAULT | Default author policy. Which is essentially take no special action. |
COMPOSITE | This is a composite source. At this point we create a new source or update an existing source for each author. |
Enum source_feed_format
This is an enum type for source_feed_format. The following values are accepted:
Enum Name | Description |
---|---|
UNKNOWN | Unknown feed format |
ATOM | Atom feed format |
RSS | RSS feed format |
Enum main_format
This is an enum type for main_format. The following values are accepted:
Enum Name | Description |
---|---|
HTML | HTML form of the content. This is the default for the vast majority of the content we index. |
TEXT | The content is formatted as plain text. Which is primarily used for Twitter and other microblogging services. |
Enum author_gender
This is an enum type for author_gender. The following values are accepted:
Enum Name | Description |
---|---|
MALE | Male |
FEMALE | Female |
UNKNOWN | Unknown |
Enum geo_method
This is an enum type for geo_method. The following values are accepted:
Enum Name | Description |
---|---|
DEFAULT | The default strategy was used which is the location specified in the content. |
SOURCE_LOCATION | The location of the source was used to compute the geo data. |
Enum shared_type
This is an enum type for shared_type. The following values are accepted:
Enum Name | Description |
---|---|
NONE | This is not shared |
RAW | This is shared content but no additional text has been given. IE it is raw. |
REPLY | Shared content but the user has added additional content/text. |
Enum card
This is an enum type for card. The following values are accepted:
Enum Name | Description |
---|---|
SUMMARY | Basic summary of the content. |
SUMMARY_LARGE_IMAGE | Basic summary of the content using a large image |
PHOTO | The content is a photo |
GALLERY | The content is a photo gallery with multiple images |
PLAYER | The content is an embedded video player |
Enum type
This is an enum type for type. The following values are accepted:
Enum Name | Description |
---|---|
POST | A blog post, mainstream news article, tweet, etc. |
COMMENT | A reply to a post inline, usually by members of the community. |
Enum sentiment
This is an enum type for sentiment. The following values are accepted:
Enum Name | Description |
---|---|
POSITIVE | Positive sentiment |
NEGATIVE | Negative sentiment |
NEUTRAL | Neutral sentiment (neither positive nor negative) |
source schema
Stores metadata for representing a source in our index. Weblog, twitter, mainstream news, etc.
Member Name | Type | Description |
---|---|---|
base64filesafe(sha1(resource)) of the source. Essentially the base 64 (filesafe) encoding of the sha1 of the tokenized permalink/url of the source. | ||
The tokenized URL for this source. | ||
The non-tokenized URL for this source. Use this URL if you would like to fetch this source via HTTP. | ||
The publisher type (mainstream news, weblog, forum, etc) of this source encoded as an int. | ||
A string representing the publisher sub type which is more specific than the publisher type. The publisher subtype is usually the name of the social network hosting the content. | ||
The time we added this source to our index. This is the time we found the source not when it was created. | ||
The last time our crawler visited the source and processed it with a task. This is always incremented even if the site isn’t updated or even if the site is HTTP 500 or other network/transient errors. This may not be updated if we aren’t fetching the source via HTTP. | ||
The last time this source published a new HTML file (as measured by content_sha1). This may not be updated if we aren’t fetching the source via HTTP. | ||
The last time this source posted a new piece of content | ||
The number of milliseconds between updates to re-fetch this source. This is used to for cyclical updates of sources and usually depends on how often the source posts updates. | ||
The HTTP status code of the last request to this source. | ||
The probability, between 0 and 1, that this source is a spam source. -1.0 if we have not yet classified it. | ||
The length, in bytes, of this HTML from the last time we fetched the page. | ||
The SHA1 checksum of the content. | ||
The set of tags assigned to this source by the either customers or spinn3r (globally). This is used so that your client can filter by assigned tags or search by them as well. This is not to be confused with the tags field which are assigned by the site. These tags are opaque strings and not human readable to avoid giving away any customer information in the API. Any sources you manually register are assigned tags with your vendor auth code. This will allow you to register sources, and then filter / search over them. | ||
Set of hashcodes for URLs that were present on the page during the last fetch. Used to prevent duplicate indexing. | ||
The priority of a source for queueing purposes. Acceptable values are from 0-9 with 0 being the lowest priority and 9 being the highest. This allows us to efficiently rebuild the queue based on priorities. It will also (in the future) allow the scheduler to scheduler higher priority items first. | ||
When true, tracing is enabled on this source to write log messages to cassandra for debug purposes. | ||
JSON parser definition (valhalla) for our robot parsing rules for extracting content from the raw HTML. This is either generated by hand by Spinn3r internally or trained via an external process like mturk. | ||
How to handle posts when we find them on the front page of content. For microblogs we can probably just write them. For other types of sources we should probably not write them and follow instead. | ||
Regular expression for matching posts (extracted from metadata) to crawl and index. Any post URL on the page that matches the pattern will be crawled. To disable just use NoFollowPolicy which generates a regex that always fails. | ||
Regular expression for matching URLs to crawl and index. Any permalink URL on the page that matches the pattern will be crawled. To disable just use NoFollowPolicy which generates a regex that always fails. | ||
Override the default user agent. These are present in a global repository or user agents which robots cache locally. | ||
Override the default proxy setting (which is probably true). | ||
Override the default proxy host. | ||
When greater than zero, this source is marked as disabled. | ||
The update strategy for computing the update interval. | ||
The update stratey for computing the update interval. | ||
Policy on handling author metadata. | ||
True if this is an indirect source which we don’t monitor directly. Direct sources approach 100% accuracy as we directly index the sources content. Indirect sources are only the result of indexing some type of secondary system which doesn’t have all the posts from the source. | ||
When true, we enable crawling via setting_crawl_permalink_pattern and a same site policy. | ||
The PSHB hub this source is using. | ||
The PSHB topic this source is using. | ||
The last time this source posted and sent a message to the PSHB endpoint. | ||
The time this PSHB lease expires. | ||
The ID that tasks are assigned which match this source. This way we can avoid duplicate message enqueue since we verify that the task has the right message_owner. | ||
The set of tags assigned to this source by the loader. The loader can assign tags to both a source and discovery object and if we have an error we can back out a specific set of loaded sources. | ||
The last throttle key used to execute this source. This can be used to quickly re-execute a source without computing a new throttle key, or to audit a specific source and potentially how far behind it is in the queue. | ||
Cookies used when requesting content form this source. | ||
HTTP request headers sent when requesting content from this source. | ||
The number of user interactions from other sources on this social network computed from the graph as we index content. This is periodically computed and loaded into our source index. This could be the number of at mentions, comment replies, etc. | ||
The template identifier used for this source (when present). | ||
The template identifier used to index permalinks published from this source (when present). | ||
The update strategy for fetching/indexing permalinks on this source.. | ||
The number of milliseconds between updates to re-fetch this permalink. This is used to for cyclical updates . | ||
The maximum amount of time (in seconds) where we should index permalinks. | ||
How should we handle new content. By default we index all content on a source the first time we see it. | ||
Hard coded strategy for avoiding duplicate URLs | ||
The minimum metadata score before we can persist content | ||
The maximum number of pages (overriding the default) to fetch for this source. | ||
The maximum number of HTTP retries (overriding the default) to fetch within one task without rescheduling / retrying it for future execution. | ||
The schedule (cron syntax) for re-executing this message in the future. | ||
The next time we’ve scheduled the source to update | ||
Certain sources need to be re-indexed for a given time window as posts update to collect likes, shares, etc. This is the amount of time, in seconds, to go through and scan sources making sure we’ve indexed everything. | ||
The title of the source. | ||
A short description of the source. | ||
Unique handle for this source across the entire social media property. | ||
The number of favorites this source has according to the website or social network. | ||
The number of followers this source has according to the website or social network. | ||
The number of users / friends this source is following. | ||
True when this user account is verified to be authentic. | ||
Set of URLs on other social networking sites and weblogs for this user. These are essentially alternate profiles for the user. Their twitter site, facebook site, etc. | ||
The human readable location of the source. Example: ‘Washington, DC’ | ||
The URL to the img which represents this source. | ||
The width of the image. | ||
The height of the image. | ||
The telephone number for this source. Only present in limited situations. Specifically around REVIEW sites. | ||
Tags for the source provided by the user’s profile. | ||
The rating for this item provided by the user. | ||
The URL to the favicon which represents this source. | ||
The width of the favicon. | ||
The height of the favicon. | ||
The time this account was created and is provided from the source. | ||
The number of Facebook likes for this source. | ||
A set of tags, optionally assigned by a site, which relate to this specific source. Supported for medium.com only (for now) | ||
The number of posts parsed/found when we last indexed this source. | ||
The maximum number of parsed posts we’ve ever seen. If parsed_posts_max greater than zero and parsed_posts is 0 then we are probably hitting a throttle or failing to parse the content. | ||
The URL of the RSS feed. | ||
The title of the feed. | ||
The format of the feed as a token. RSS or ATOM, etc. |
Enum publisher_type
This is an enum type for publisher_type. The following values are accepted:
Enum Name | Description |
---|---|
UNKNOWN | Unknown publisher type. |
WEBLOG | Weblog. Defined as a smaller site, usually owned by an individual. |
MAINSTREAM_NEWS | Mainstream news source. Generally owned by a corporation with multiple paid writers. |
CLASSIFIED | Classified site. Craigslist, Backpage, etc. |
FORUM | Forum sites like phpBB, phorum, vbulletin, etc |
REVIEW | Review site. Like epinions, amazon reviews, etc. |
MEMETRACKER | Memetracker like reddit, digg, techmeme, google news, etc |
MICROBLOG | Microblog content such as Twitter, identi.ca, etc. |
SOCIAL_MEDIA | Social media sites (facebook, instagram, etc). |
VIDEO | Video hosting site like Youtube, Vimeo, etc. |
PHOTO | Photo sharing site like instagram, flickr, etc. |
Enum setting_post_persist_policy
This is an enum type for setting_post_persist_policy. The following values are accepted:
Enum Name | Description |
---|---|
WRITE | Write posts when found on a source using the metadata extraction. |
NOWRITE | DO NOT write posts. |
Enum setting_disabled
This is an enum type for setting_disabled. The following values are accepted:
Enum Name | Description |
---|---|
ENABLED | Default state. The souce is enabled. |
DISABLED | The source is disabled but without a specific reason. |
SPAM | The source has been marked as spam. |
DUPLICATE | Duplicate of another source. |
INVALID | Invalid source. Not anything we are interested in indexing. |
Enum setting_update_strategy
This is an enum type for setting_update_strategy. The following values are accepted:
Enum Name | Description |
---|---|
CYCLICAL | Default update strategy. Essentially just update the source at a regular rate. |
ADAPTIVE | Adapt the update interval based on the posting frequency of the source. This way we update sources less frequently if they post once per month compared to sources that update once per hour. |
PUSH | The source content is pushed to us directly via push (pshb) |
SEARCH | The source is updated only from a search feed. It’s not updated directly but a parent source updates it. |
PING | We receive a ping via an external update mechanism. This is a notice that the blog has been updated. Then we launch a task to fetch the content. |
INDIRECT | This source is NOT updated directly but rather is updated indirectly via another source. Usually a search feed. |
FEED | The source is updated via an RSS / Atom feed. We don’t index it directly. |
NONE | The source is not updated. |
SCHEDULED | This source is using the scheduled setting based on cron. |
Enum setting_index_strategy
This is an enum type for setting_index_strategy. The following values are accepted:
Enum Name | Description |
---|---|
DEFAULT | Default index strategy. Just a normal source. No special or fancy strategy. |
SEARCH_KEYWORD | This source is a search source driven by keywords. |
SEARCH_USERNAME | This source is a search source driven by ranked usernames. |
SEARCH_KEYWORD_AND_CITY | This source is a search source driven by keywords but we include the city as well. |
COVERING_SET | This source is indexed via a covering set of subscriptions. |
Enum setting_author_policy
This is an enum type for setting_author_policy. The following values are accepted:
Enum Name | Description |
---|---|
DEFAULT | Default author policy. Which is essentially take no special action. |
COMPOSITE | This is a composite source. At this point we create a new source or update an existing source for each author. |
Enum setting_permalink_update_strategy
This is an enum type for setting_permalink_update_strategy. The following values are accepted:
Enum Name | Description |
---|---|
NONE | Never update this permalink |
CYCLICAL | Default update strategy. Essentially just update the permalink at a regular rate. |
Enum setting_content_init_strategy
This is an enum type for setting_content_init_strategy. The following values are accepted:
Enum Name | Description |
---|---|
FUTURE_ONLY |
Enum setting_content_filter_strategy
This is an enum type for setting_content_filter_strategy. The following values are accepted:
Enum Name | Description |
---|---|
NONE | No explicit strategy |
ROBOT_FLAT_LINK_FILTER | Use the flat link filter. |
Enum feed_format
This is an enum type for feed_format. The following values are accepted:
Enum Name | Description |
---|---|
UNKNOWN | Unknown feed format |
ATOM | Atom feed format |
RSS | RSS feed format |
Near Duplicates
Near duplicates are common on the web. Some of the most common examples are the Associated Press and Reuters which both syndicate their content to other websites.
This can lead to the same content present on hundreds of websites.
Additionally, the same media property could publish the same content under
different URLs. For example a different sub-domain such as cnn.com
vs money.cnn.com
Detecting Near Duplicates
Search example collapsing near duplicates:
{
"query": {
"query_string" : {
"query" : "source_publisher_type:MAINSTREAM_NEWS AND duplicates_count:>10"
}
},
"aggs": {
"by_duplicate_identifier": {
"terms": {
"field": "duplicate_identifier",
"size": 1
},
"aggs": {
"primary_documents": {
"top_hits": {
"size": 1
}
}
}
}
}
}
Fortunately, Spinn3r has built in near duplicate detection.
We can discover when content is identical and suppress the content by allowing
you to see which document IDs are identical or allowing you to group by a duplicate_identifier
.
There are two main facilities here for suppressing duplicates. I’ll start with the one I think you care about most but also include how to handle it in a more generic form.
You can run a general query for content but include an Elasticsearch aggregation to suppress/collapse the duplicates and only return the first.
Each document has a duplicate identifier which is essentially a cluster of duplicates. You’re basically collapsing them here and just returning the first.
This is probably the easiest way to get started.
Note in the above query I added
duplicates_count:>10
This was just to highlight the duplicates so that this query collapses the duplicate documents easily.
Additionally, you can run a full query and or store results on your end. There’s a ‘duplicates’ field which has a map of document ID to jaccard coefficient / similarity coefficient which is just a similarity probability. Values are from 0.0 to 1.0 inclusive AKA [0.0,1.0]
You can also collapse stored documents on your end using this duplicates field.
Geo countries
Spinn3r supports the following GEO country codes:
1
|
Afghanistan | AF |
---|---|---|
2
|
Aland Islands | AX |
3
|
Albania | AL |
4
|
Algeria | DZ |
5
|
American Samoa | AS |
6
|
Andorra | AD |
7
|
Angola | AO |
8
|
Anguilla | AI |
9
|
Antarctica | AQ |
10
|
Antigua and Barbuda | AG |
11
|
Argentina | AR |
12
|
Armenia | AM |
13
|
Aruba | AW |
14
|
Australia | AU |
15
|
Austria | AT |
16
|
Azerbaijan | AZ |
17
|
Bahamas | BS |
18
|
Bahrain | BH |
19
|
Bangladesh | BD |
20
|
Barbados | BB |
21
|
Belarus | BY |
22
|
Belgium | BE |
23
|
Belize | BZ |
24
|
Benin | BJ |
25
|
Bermuda | BM |
26
|
Bhutan | BT |
27
|
Bolivia | BO |
28
|
Bosnia and Herzegovina | BA |
29
|
Botswana | BW |
30
|
Bouvet Island | BV |
31
|
Brazil | BR |
32
|
British Indian Ocean Territory | IO |
33
|
British Virgin Islands | VG |
34
|
Brunei Darussalam | BN |
35
|
Bulgaria | BG |
36
|
Burkina Faso | BF |
37
|
Burundi | BI |
38
|
Cambodia | KH |
39
|
Cameroon | CM |
40
|
Canada | CA |
41
|
Cape Verde | CV |
42
|
Cayman Islands | KY |
43
|
Central African Republic | CF |
44
|
Chad | TD |
45
|
Chile | CL |
46
|
China | CN |
47
|
Christmas Island | CX |
48
|
Cocos (Keeling) Islands | CC |
49
|
Colombia | CO |
50
|
Comoros | KM |
51
|
Congo (Brazzaville) | CG |
52
|
Congo, Democratic Republic of the | CD |
53
|
Cook Islands | CK |
54
|
Costa Rica | CR |
55
|
Croatia | HR |
56
|
Cuba | CU |
57
|
Cyprus | CY |
58
|
Czech Republic | CZ |
59
|
Côte d'Ivoire | CI |
60
|
Denmark | DK |
61
|
Djibouti | DJ |
62
|
Dominica | DM |
63
|
Dominican Republic | DO |
64
|
Ecuador | EC |
65
|
Egypt | EG |
66
|
El Salvador | SV |
67
|
Equatorial Guinea | GQ |
68
|
Eritrea | ER |
69
|
Estonia | EE |
70
|
Ethiopia | ET |
71
|
Falkland Islands (Malvinas) | FK |
72
|
Faroe Islands | FO |
73
|
Fiji | FJ |
74
|
Finland | FI |
75
|
France | FR |
76
|
French Guiana | GF |
77
|
French Polynesia | PF |
78
|
French Southern Territories | TF |
79
|
Gabon | GA |
80
|
Gambia | GM |
81
|
Georgia | GE |
82
|
Germany | DE |
83
|
Ghana | GH |
84
|
Gibraltar | GI |
85
|
Greece | GR |
86
|
Greenland | GL |
87
|
Grenada | GD |
88
|
Guadeloupe | GP |
89
|
Guam | GU |
90
|
Guatemala | GT |
91
|
Guernsey | GG |
92
|
Guinea | GN |
93
|
Guinea-Bissau | GW |
94
|
Guyana | GY |
95
|
Haiti | HT |
96
|
Heard Island and Mcdonald Islands | HM |
97
|
Holy See (Vatican City State) | VA |
98
|
Honduras | HN |
99
|
Hong Kong, Special Administrative Region of China | HK |
100
|
Hungary | HU |
101
|
Iceland | IS |
102
|
India | IN |
103
|
Indonesia | ID |
104
|
Iran, Islamic Republic of | IR |
105
|
Iraq | IQ |
106
|
Ireland | IE |
107
|
Isle of Man | IM |
108
|
Israel | IL |
109
|
Italy | IT |
110
|
Jamaica | JM |
111
|
Japan | JP |
112
|
Jersey | JE |
113
|
Jordan | JO |
114
|
Kazakhstan | KZ |
115
|
Kenya | KE |
116
|
Kiribati | KI |
117
|
Korea, Democratic People’s Republic of | KP |
118
|
Korea, Republic of | KR |
119
|
Kuwait | KW |
120
|
Kyrgyzstan | KG |
121
|
Lao PDR | LA |
122
|
Latvia | LV |
123
|
Lebanon | LB |
124
|
Lesotho | LS |
125
|
Liberia | LR |
126
|
Libya | LY |
127
|
Liechtenstein | LI |
128
|
Lithuania | LT |
129
|
Luxembourg | LU |
130
|
Macao, Special Administrative Region of China | MO |
131
|
Macedonia, Republic of | MK |
132
|
Madagascar | MG |
133
|
Malawi | MW |
134
|
Malaysia | MY |
135
|
Maldives | MV |
136
|
Mali | ML |
137
|
Malta | MT |
138
|
Marshall Islands | MH |
139
|
Martinique | MQ |
140
|
Mauritania | MR |
141
|
Mauritius | MU |
142
|
Mayotte | YT |
143
|
Mexico | MX |
144
|
Micronesia, Federated States of | FM |
145
|
Moldova | MD |
146
|
Monaco | MC |
147
|
Mongolia | MN |
148
|
Montenegro | ME |
149
|
Montserrat | MS |
150
|
Morocco | MA |
151
|
Mozambique | MZ |
152
|
Myanmar | MM |
153
|
Namibia | NA |
154
|
Nauru | NR |
155
|
Nepal | NP |
156
|
Netherlands | NL |
157
|
Netherlands Antilles | AN |
158
|
New Caledonia | NC |
159
|
New Zealand | NZ |
160
|
Nicaragua | NI |
161
|
Niger | NE |
162
|
Nigeria | NG |
163
|
Niue | NU |
164
|
Norfolk Island | NF |
165
|
Northern Mariana Islands | MP |
166
|
Norway | NO |
167
|
Oman | OM |
168
|
Pakistan | PK |
169
|
Palau | PW |
170
|
Palestinian Territory, Occupied | PS |
171
|
Panama | PA |
172
|
Papua New Guinea | PG |
173
|
Paraguay | PY |
174
|
Peru | PE |
175
|
Philippines | PH |
176
|
Pitcairn | PN |
177
|
Poland | PL |
178
|
Portugal | PT |
179
|
Puerto Rico | PR |
180
|
Qatar | QA |
181
|
Romania | RO |
182
|
Russian Federation | RU |
183
|
Rwanda | RW |
184
|
Réunion | RE |
185
|
Saint Helena | SH |
186
|
Saint Kitts and Nevis | KN |
187
|
Saint Lucia | LC |
188
|
Saint Pierre and Miquelon | PM |
189
|
Saint Vincent and Grenadines | VC |
190
|
Saint-Barthélemy | BL |
191
|
Saint-Martin (French part) | MF |
192
|
Samoa | WS |
193
|
San Marino | SM |
194
|
Sao Tome and Principe | ST |
195
|
Saudi Arabia | SA |
196
|
Senegal | SN |
197
|
Serbia | RS |
198
|
Seychelles | SC |
199
|
Sierra Leone | SL |
200
|
Singapore | SG |
201
|
Slovakia | SK |
202
|
Slovenia | SI |
203
|
Solomon Islands | SB |
204
|
Somalia | SO |
205
|
South Africa | ZA |
206
|
South Georgia and the South Sandwich Islands | GS |
207
|
South Sudan | SS |
208
|
Spain | ES |
209
|
Sri Lanka | LK |
210
|
Sudan | SD |
211
|
Suriname * | SR |
212
|
Svalbard and Jan Mayen Islands | SJ |
213
|
Swaziland | SZ |
214
|
Sweden | SE |
215
|
Switzerland | CH |
216
|
Syrian Arab Republic (Syria) | SY |
217
|
Taiwan, Republic of China | TW |
218
|
Tajikistan | TJ |
219
|
Tanzania *, United Republic of | TZ |
220
|
Thailand | TH |
221
|
Timor-Leste | TL |
222
|
Togo | TG |
223
|
Tokelau | TK |
224
|
Tonga | TO |
225
|
Trinidad and Tobago | TT |
226
|
Tunisia | TN |
227
|
Turkey | TR |
228
|
Turkmenistan | TM |
229
|
Turks and Caicos Islands | TC |
230
|
Tuvalu | TV |
231
|
Uganda | UG |
232
|
Ukraine | UA |
233
|
United Arab Emirates | AE |
234
|
United Kingdom | GB |
235
|
United States Minor Outlying Islands | UM |
236
|
United States of America | US |
237
|
Uruguay | UY |
238
|
Uzbekistan | UZ |
239
|
Vanuatu | VU |
240
|
Venezuela (Bolivarian Republic of) | VE |
241
|
Viet Nam | VN |
242
|
Virgin Islands, US | VI |
243
|
Wallis and Futuna Islands | WF |
244
|
Western Sahara | EH |
245
|
Yemen | YE |
246
|
Zambia | ZM |
247
|
Zimbabwe | ZW |
Geo States
state |
---|
Alabama |
Alaska |
Arizona |
Arkansas |
California |
Colorado |
Connecticut |
Delaware |
Florida |
Georgia |
Hawaii |
Idaho |
Illinois |
Indiana |
Iowa |
Kansas |
Kentucky |
Louisiana |
Maine |
Maryland |
Massachusetts |
Michigan |
Minnesota |
Mississippi |
Missouri |
Montana |
Nebraska |
Nevada |
New Hampshire |
New Jersey |
New Mexico |
New York |
North Carolina |
North Dakota |
Ohio |
Oklahoma |
Oregon |
Pennsylvania |
Rhode Island |
South Carolina |
South Dakota |
Tennessee |
Texas |
Utah |
Vermont |
Virginia |
Washington |
Washington, D.C. |
West Virginia |
Wisconsin |
Wyoming |