One of the most common uses of Yahoo Pipes is to mashup several RSS feeds and perform a few operations on them, such as sorting by date or filtering for keywords. This is a relatively easy Pipe to build, and you have many options for filtering and/or sorting, depending on your needs. Before you continue reading this tutorial, please read up on basic Yahoo Pipes functionality.

Sample Yahoo Pipe Details

In the sample Pipe for this tutorial, we'll mash together the RSS feeds of two blogs, Performancing and Blog Herald, sort reverse chronologically by date, then truncate the resulting RSS "stream" at 20 items.

Basic Yahoo Pipe Building Process

Let's go over a basic "feed mashup" example, then discuss variations. The most basic feed mashup is to simply supply two or more RSS feed URLs in a Yahoo Pipe. (For simplicity, let's start with a static list of feeds.) But that's not useful since you don't know how items in the resulting "stream" are ordered. So let's sort the mashed up feed in reverse chronological order (newest to oldest). The process is as follows:
  1. Supply two or more RSS feed URLs.
  2. Fetch the feeds.
  3. Combine the feeds together.
  4. Sort them in reverse chronological order.
  5. Truncate the mashed feed to 20 items.
  6. Output the resulting stream as an RSS feed

Yahoo Pipes Modules to Use

That is the most basic process, and to produce an actual Yahoo Pipe, we need to use only 4 modules, in the following order:
  1. Fetch Feed. Specify one or more feeds, grab their items, and combine them into a single feed. Notes:
    • This Pipes module is found under the Sources sub-menu.
    • Assume resulting item ordering is undefined.
    • In our example, shown below, we'll use the Performancing and Blog Herald RSS feeds.
  2. Sort. Specify a sorting field and criteria. In this case, sort by Publication Date (item.pubDate) in descending order. Notes:
    • This module is found under the Operators sub-menu.
    • In the past, Pipes' Sort module has not always worked consistently, but we are at its mercy.
  3. Truncate. Take the mashed stream and truncate it after X items. Notes:
    • This module is found under the Operators sub-menu.
    • We're limiting the result to 20 items.
  4. Output. Output the mashed stream. This will be available from the standard Yahoo Pipes "run" interface. From there, you can retrieve the dynamic RSS feed URL for the output stream. Note:
    • This module is automatically displayed in a new Pipe, once you drag and drop any other module.
A screenshot of a sample Pipe is displayed below. Provided that you have a Yahoo Mail account, you can access the sample Pipe, clone a copy into your account, and tweak to your heart's content.



Basic Pipe Variations

There are a number of simple variations that you can apply to the above sample Pipe that are relatively easy to do.
  1. Sort in chronological order (oldest article first).
  2. Filter by date. E.g., articles newer than a certain day.
  3. User-supplied date filter. Pick the maximum
  4. User-supplied truncation limit. So instead of hard-coding "20" as the number of items in the mashed feed, let the end user of the Pipe supply the value.
  5. User-supplied URLs. You are limited to a fixed number of URLs that can be specified.

Advanced Pipe Variations

Here are some more advanced variations for our feed mashup Pipe.
  1. Remove extraneous data fields in input streams. If you look at the results of the example Pipe, you'll see more than just the item description and title. One of the input feeds has extra links thanks to its Feedburner settings. If you don't want them appearing in the result, you have to filter these fields out before the two feeds are mashed together.
  2. Truncate the number of items used on a per feed basis rather than on the entire mashed feed. E.g., the 5 most recent items for each feed.
  3. Use a dynamic external list of feed URLs. The best way to do this is to build Pipe #1 to process a single feed: sort reverse chronologically and truncate to X items (hard-coded or user-supplied). Then build Pipe #2 to read an external list of feed URLs and loop through each, supplying each stream to Pipe #1, then mashing up the results of all streams.

Requests for Custom Pipe

Obviously, I cannot provide example Pipes for each variation listed above, but what I will do is two things:
  1. Take suggestions for an advanced Pipe as described above, create it, then either share it in the comments here or cover it in the next Yahoo Pipes By Example post. So I'll build a free custom Pipe, provided it's not too complex, can be done in a few hours, and is generic enough that it'll be useful to someone other than yourself. (I can only work on it on weekends.)
  2. Progressively include other Pipes modules in hard exampes.
So if there's a Pipe you need and something like it is described in the Advanced section above, feel free to ask. Be specific about the details. I'll be doing the next Yahoo Pipes By Example post next weekend (one per week).
Raj Dash's blog

This is the last part in a short series about Yahoo Pipes and how to use it to analyze Digg homepage stories and any topical trends. Part 1 covered sorting by submitter. Part 2 covered sorting by category and submitter. Part 3’s Pipes filter stories by number of votes, then sort by any or all of submitter, category and votes.

All of the general Pipes principles presented in these examples can be used to analyze other social media sites, and thus could be a handy tool for building site profiles. The only big difference in the 4 examples in this Part, compared to the others, is that we’re introducing a numeric variable and a filtering rule. To keep this post short, I’m leaving out most of the sections that I had in the previous posts. Please see them for explanations of Digg variables, Pipes variables, regular expression pattern rules, etc.

Video

The video is long (nearly 10 min) because it covers four variations of a Pipe based on vote sorting and filtering. See the Links section, below the video, for links to the actual Pipes, in case you feel like cloning and tweaking them.

Links

If you feel like playing around with the Pipes used in this example, here are the links:

  1. Digg sorted by votes - very simple: just inserts the number of votes in square brackets at the beginning of each story tile.

  2. Digg over X votes - only shows home page stories with more than X votes, where you can specify X when running the Pipe.
  3. Digg over X votes, sorted by category and votes - same as #2, but also sorts stories first by category and then by number of votes.
  4. Digg over X votes, sorted by submitter and votes - same as #2, but also sorts stories first by submitter and then by number of votes.

Summary

These Pipes examples for Digg have been fairly simple in nature. There are more complex examples over at Yahoo Pipes. Just search there for “Digg” and browse the results.

This is part two of a short series that uses Yahoo Pipes to analyze information about Digg home page stories. Part 1 covered sorting by submitter (member name). Anything not explained here is probably explained in the last post - so please check there.

This part looks at sorting first by story category, then by category and submitter. The former shows any topical bias for the site (i.e., Digg members obviously like articles about Apple), and the latter shows any topical bias for popular members.

Of course, you could guess at some of this information, but having a social media analysis tool lets you collect data automatically over a long term. You can use similar pipes to build a profile for each social media site. Or if you don’t feel like saving the daily data, you can just subscribe to the resulting modified RSS feed in your favorite feed reader. Note that each social media site has different feed information and thus you cannot reuse the Pipes in this example as is. However, the general principles are the same.

Digg feed variables used

We’ll use two Digg feed variables in Yahoo Pipes this time:

digg:category
digg:submitter.digg.username

The regex rule used for the title is

replace (^.*$) with [${digg:category}] [${digg:submitter.digg:username}] $1.

However, we could also use:

replace ^ with [${digg:category}] [${digg:submitter.digg:username}] .

They both mean the same thing: insert the Digg category and username at the beginning of the story title. (Note that there’s a space at the end of the second version’s replacement pattern string.)

Process

The process for building these Pipes is fairly simple.

Pipe 2a: by category only

  1. Grab the Digg home page feed.
  2. Insert the story category, in square brackets, at the beginning of each title.
  3. Replace each story’s item.description field with nothing.
  4. Sort the list of home page stories by modified title.

Pipe 2b: by category and submitter

  1. Grab the Digg home page feed.
  2. Insert both the story category and submitter, each in square brackets and in that order.
  3. Replace each story’s item.description field with nothing.
  4. Sort the list of home page stories by modified title.

Yahoo Pipes modules used

These are the modules used in these Pipes:

  1. Fetch Feed
  2. Regex
  3. Sort
  4. Pipe Output

Video

Digg home page by category

Digg home page by category and submitter

Links

Here are the links to the two Yahoo Pipes used in this example. If you have a Yahoo Mail account, you can clone and tweak these pipes:

  1. Digg homepage by category.
  2. Digg homepage by category and submitter.

Summary

Part 3 of this series will use Yahoo Pipes to filter out home page story categories you’re not interested in, as well ignore stories with less than X votes, where you can specify X.

For those of you that like to follow social media sites such as Digg, an easy analysis tool may be of some use to you. Yahoo Pipes lets you very quickly put together a suite of tools to organize a web feed’s items. In this example, I’m going to to sort the Digg homepage RSS feed by the submitter of each story.

To do that, we need to manipulate some of the content of the Digg feed using the Yahoo Pipes Regex (regular expressions) module. Otherwise, all the information we need is in the feed.

Regular expression patterns:
I’m not going to get into an elaborate discussion of regexes. Instead, I’ll just list what I’ve used in the screencast video. (If you’re familiar with regexes already, bear with me.)

  1. ^ - caret - match the beginning of a string.
  2. $ - dollar - match the end of a string.
  3. .* - dot star - match any sequence of characters.
  4. ^.*$ - match the entire string.
  5. (^.*$) - match the entire string and save it in parameter 1, aka $1.

Digg feed variables used:
The Digg home page RSS feed has a number of fields/ variables that we can access in Yahoo Pipes. In this example, I’ve only used one:

digg:submitter.digg:username

Within Yahoo Pipes, to access it, we place braces (curly brackets) around it:

${digg:submitter.digg:username}

Process:
These are the steps I take in the video below.

  1. Grab the Digg home page feed.
  2. Insert the digg username (of the story submitter) in the item.title field’s values, at the beginning of the title, surrounded by square brackets.
  3. Do the same with the item.y:title field. (This is probably redundant, but it’s not a big deal.)
  4. Replace the item.description fields with nothing - i.e., an empty string. For our analysis, getting rid of the description reduces visual clutter in the results. It’s just easier to see only the title and submitter.
  5. Sort the resulting manipulated feed by the item.title.

What we’re doing is taking a story title such as

Paris’ Sob Story

with

[RainbowPhoenix] Paris’ Sob Story

for each home page story. The string in the square brackets is the name of the Digg member that submitted the article. So ^.*$ matches “Paris’ Sob Story”, and the () brackets assigns this string to $1. Thus the Regex replace rule (^.*$) for item.title takes the very same title and inserts the current digg username in square brackets in front of the title.

[${digg:submitter.digg:username}] $1

Other than getting rid of the story description, this all we’re really doing, followed by a sort on the title values.

Yahoo Pipes modules used:

  1. Fetch Feed
  2. Regex
  3. Sort
  4. Output

Here’s a SplashCast screencast showing the process of creating the Pipe. (Apologies for the choppy narration, as I had to use an earlier voiceover due to upload problems.)

Yahoo Pipes - digg homepage sorted by submitter

You can take my Digg by Submitter pipe, clone and tweak it to your heart’s content. Or wait for the next one. In the next part of this mini-series, we’ll sort the Digg homepage by category (and prove an Apple bias for the home page).

Sphinn is a brand new player in the social media space that many of you are already familiar. It’s still young, but the calling of new, fresh data to analyze got the better of the math geek in me and I built a few Yahoo! Pipes on their RSS feeds. [This post is a continuation of an earlier Yahoo! Pipes: Analyzing Digg (By Submitter, By Category and Submitter, Filter by Votes) on Search Engine Journal, but without a screencast video showing the building of a Pipe.]

Pipes and Processes

There isn’t a great deal of information in Sphinn RSS feeds just yet (see the Wishlist section), compared to, say, Digg feeds. However, there’s enough that I could build a few Pipes. Here they are, all of which you can clone and tweak, if you have a Yahoo! Mail account.

  1. Sphinn new item category count. Yahoo Pipe results/ feed.
    Take the New Items feed and product a count of stories in each category.
    1. Grab feed.
    2. Use Unique module on category.
    3. Hack the category name into a section URL on Sphinn. (Some errors may exist because this had to be a manual hack, due to lack of section URL info in the feed.) This allows you to click on a category in the Pipes results and go to the corresponding section on Sphinn.
    4. Output results.
  2. Digg new item category count. Yahoo Pipe results/ feed.
    The process for this one is exactly the same. Only the feed URL and the fields are different in the Pipe.
  3. Sphinn new + hot searchable. Yahoo Pipe results.
    This Pipe merges the Sphinn New and Hot feeds and lets you search them. Remember to run the Pipe with your query before subscribing to the resulting feed. Note to the Sphinn boys and girls… This might make a good tag line: “Sphinn: New, Hot, and Searchable.”
    1. Grab both feeds and merge them.
    2. Eliminate duplicate items by title.
    3. Sort in reverse chronological order.
    4. Apply user’s search term.
    5. Apply user’s limit for number of results.
    6. Output results. (The link on each item is to the Sphinn snippet, not the actual article.)
  4. Most active Sphinn commenters at the moment. Yahoo Pipe results/ feed.
    Want to know which Sphinn members are the most active in terms of commenting on stories? This Pipe provides this metric, but is limited by the fact that only 40 comments are in the feed. So you get an idea of fresh commenting activity. (If you want overall commenting activity since Sphinn began, you would first have to scrape the All-Users pages to get a list of members. So a members RSS feed would be nice.)
    1. Grab the comments feed.
    2. Apply the Unique module by dc:creator (commenter).
    3. Sort in descending order by number of comments for that person, in the current list.

yahoo pipes -sphinn -active commenters screen snap

Wishlist

Sphinn is pretty new, so infrastructure quirks are to be expected. But because the Sphinn feeds do not carry as much info as the Digg feed, there is very little anaysis that can be done in a Yahoo Pipe. Here’s a bit of a wishlist for Sphinn RSS feeds that would help data lusters like myself.

  • More than 40 items in a feed.
  • More information in the feeds.
  • A category URI in each story item so that it’s easy to link to a category’s home page on Sphinn. Or, alternately, an easier mapping from compound category names to the corresponding category home page. Digg drops a story into a URL that contains the category path.
  • Inclusion of the end story’s URL.
  • More member feeds. (These are coming, according to Sphinn.)

Conclusion

How you use the information generated by these Yahoo! Pipes is up to you, but if you’re a data miner/ data luster like myself, you’ll figure out something useful.

Steve Rubel Twittered last night saying:

Checking out blognation. Like it but wish I could subscribe to individual bloggers. http://us.blognation.com/

He raises a great point. It can be annoying on multi-author blogs to have to read everything when you're only interested in the perspectives of some of the authors. On Technology Evangelist, we address with with individual author feeds on each author page.

However, another way to achieve this is to use Yahoo Pipes to filter a blog feed by author. As an example, i created an Yahoo Pipes feed filter for Blognation that creates a filter for the author of your choice. I arbitrarily chose Marc Orchant as the default author, so clicking the Run button will filter the feed for Mr. Orchant unless you switch out the name with other authors.

Giving people control over what they consume is going to happen whether you enable it or not. Clearly, few people are filtering RSS feeds on Yahoo Pipes today, but stuff like this is going to happen.


One of the easiest ways for a non-programmer to combine, aggregate and filter multiple RSS feeds into one is to use Yahoo! Pipes (YP). YP uses a sleek visual editor that allows the user to fetch and manipulate data sources, add user defined inputs and filter the content in a number of ways.

I used YP to combine nine* popular SEO feeds into one and then published it on pipes.yahoo.com where anybody can now use it. Try it in your favorite reader - Composite SEO News Feed.

By using the WordPress plugins FeedList and RunPHP I can also easily display the Composite SEO News Feed right here:

Remember this is the actual feed not just a graphic so whenever you are viewing this page the feed will be up to date.

When you first look at the drag and drop interface of YP it may seem a little daunting but here is a step by step using the above practical example and you can of course combine any feeds you choose.

First you need to sign in to YP with your Yahoo ID (create an ID if you don’t have one). When you’re signed in click Create a pipe and click the untitled tab to give your pipe a name. Drag a Fetch Feed into the workspace.

Drag the Fetch Feed module to the workspace.

Enter a feed url which you will find on most sites by clicking the RSS, XML or Atom link, or icon. If you see a “?” icon in the Fetch Feed module that means you have input a non-valid feed address.

Copy and paste the feed url.

Click the url icon to enter a second feed.

Click the url icon to enter a second feed.

Enter the second feed url.

Enter the second feed url.

Repeat until you have entered all the feed urls that you want to combine.

Complete the addition of feed urls.

Drag a Sort module to the workspace. Pipe the Fetch Feed module to the Sort module by clicking the circle on top of the Sort module and dragging it to the circle at the bottom of fetch module. A blue pipe will appear and connect the two.

Pipe the Fetch Feed module to the Sort module.

Sort by date in descending order by selecting PubDate from the first drop-down menu and Descending from the second drop-down menu.

Sort by date in descending order.

Drag a Truncate module to the workspace. Pipe the Sort module to the Truncate module by clicking the circle at the bottom of the Sort module and dragging it to the circle at the top of Truncate module. Enter a value for the maximum number of items you require from your combined feed.

Pipe the Sort module to the Truncate module.

Pipe the Truncate module to the Pipe Output and the Debug area will fill up with your new feed’s output.

Pipe the Truncate module to the Pipe Output.

Finally click Save and then click Publish. In the pop-up window enter a description for your pipe and when you click Publish again your Pipe will go public.

By combing YP with mashup tools like Dapper or OpenKapow you will be able to construct an RSS feed from almost anything that you can find on the Web.

*The nine feeds combined in the Composite SEO News Feed:
SEO by the SEA
Search Engine Land
Search Engine Roundtable
Matt Cutts
SEO Book
SEO Blog
SEOMoz
Threadwatch
Marketing Pilgrim