找到你要的答案

Q:Format unstructured string

Q:格式结构的字符串

I have tried several methods (by position, by white space, regex) but cannot figure how to best parse the following lines as a table. For e.g. let's say the two lines I want to parse are:

Bonds               Bid   Offer  (mm)   (mm) Chng
STACR 2015-HQA1 M1  125    120    5   x 1.5   0
STACR 2015-HQA12 2M2  265  5   x       -2

I want that it should parse as follows for [BondName] [Bid] [Offer]:

[STACR 2015-HQA1 M1] [125] [120]

[STACR 2015-HQA12 2M2] [265] [null]

Notice the null which is an actual value and also the spaces should be retained in the bond name. FYI, the number of spaces in the Bond Name will be 2 as in the above examples.

Edit: Since many of you have asked for code here it is. The spaces between the points can range from 1-5 so I cannot reply on spaces (it was straightforward then).

string bondName = quoteLine.Substring(0, 19);
string bid = quoteLine.Substring(19, 5).Trim();
string offer = quoteLine.Substring(24, 6).Trim();

The only way I can see this working is that:

  • 1st data point is STACR (Type)
  • 2nd data point is the year and Series (e.g. 2015-HQA1)
  • 3rd data point is Tranche (M1)
  • 4th data point is bid (e.g. 125 ** bid is always available **)
  • 5th data point is offer (e.g. 120 but can be blank or whitespace which introduces complexity)

我已经尝试了几种方法(位置,用空格,regex)但不能找出如何最好的解析以下行作为表。比如说我想分析的两行是:

Bonds               Bid   Offer  (mm)   (mm) Chng
STACR 2015-HQA1 M1  125    120    5   x 1.5   0
STACR 2015-HQA12 2M2  265  5   x       -2

我想它应该解析如下[ ] [ ] [ bondname投标报价]:

[STACR 2015-HQA1 M1] [125] [120]

[STACR 2015-HQA12 2M2] [265] [null]

注意NULL,这是一个实际值,也应该保留空格的债券名称。另外,在债券名称的空格数是2在上述例子。

编辑:因为你们中的许多人在这里要求代码。点之间的空间范围可以从1-5,所以我不能回答关于空间(很简单的话)。

string bondName = quoteLine.Substring(0, 19);
string bid = quoteLine.Substring(19, 5).Trim();
string offer = quoteLine.Substring(24, 6).Trim();

我能看到这工作的唯一方法是:

  • 1st data point is STACR (Type)
  • 2nd data point is the year and Series (e.g. 2015-HQA1)
  • 3rd data point is Tranche (M1)
  • 4th data point is bid (e.g. 125 ** bid is always available **)
  • 5th data point is offer (e.g. 120 but can be blank or whitespace which introduces complexity)
answer1: 回答1:

With the current set of requirements, I'm assuming the following
1. String starts with 3 part bond name
2. Followed by bid
3. Followed by offer (optional)
4. After that, we'll have something like ... x ... ... (we'll use x as reference)

Given they are valid, you can use the following code

var str = "STACR 2015-HQA1 M1  125    120    5   x 1.5   0"; //your data
var parts = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();

//we'll use this pattern : <3 part bond name> <bid> <offer/null> <something x ....>
var xIsAt = parts.IndexOf("x"); //we'll use x as reference
if (xIsAt > 2) //first three are BondName
    parts.RemoveRange(xIsAt - 1, parts.Count - xIsAt + 1); //remove "5 x 1.5 ..."
var bond = string.Join(" ", parts.Take(3)); //first 3 parts are bond
var bid = parts.Count > 3 ? parts.ElementAt(3) : null; //4th is bid
var offer = parts.Count > 4 ? parts.ElementAt(4) : null; //5th is offer

With the current set of requirements, I'm assuming the following
1. String starts with 3 part bond name
2. Followed by bid
3. Followed by offer (optional)
4. After that, we'll have something like ... x ... ... (we'll use x as reference)

鉴于它们是有效的,您可以使用以下代码

var str = "STACR 2015-HQA1 M1  125    120    5   x 1.5   0"; //your data
var parts = str.Split(new char[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).ToList();

//we'll use this pattern : <3 part bond name> <bid> <offer/null> <something x ....>
var xIsAt = parts.IndexOf("x"); //we'll use x as reference
if (xIsAt > 2) //first three are BondName
    parts.RemoveRange(xIsAt - 1, parts.Count - xIsAt + 1); //remove "5 x 1.5 ..."
var bond = string.Join(" ", parts.Take(3)); //first 3 parts are bond
var bid = parts.Count > 3 ? parts.ElementAt(3) : null; //4th is bid
var offer = parts.Count > 4 ? parts.ElementAt(4) : null; //5th is offer
answer2: 回答2:

[EDIT] I did not account for the blank 'Offer' so this method will fail on a blank 'Offer'. Looks like someone already has a working answer, but i'll leave the linq example for anyone that finds it useful. [END EDIT]

Linq based option.

Split the string by spaces, and remove empty spaces. Then reverse the order so you can start from the back and work your way forward. The data appears more normalized at the end of the string.

For each successive part of the line, you skip the previous options and only take what you need. For the last part which is the long string, you skip what you don't need, then reverse the order back to normal, and join the segments together with spaces.

        string test = "STACR 2015-HQA1 M1  125    120    5   x 1.5   0";

        var split_string_remove_empty = test.Split(new char[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries).Reverse();

        var change = split_string_remove_empty.Take(1)
                                              .SingleOrDefault();
        var mm2 = split_string_remove_empty.Skip(1)
                                           .Take(1)
                                           .SingleOrDefault();
        var mm3 = split_string_remove_empty.Skip(3)
                                           .Take(1)
                                           .SingleOrDefault();
        var offer = split_string_remove_empty.Skip(4)
                                             .Take(1)
                                             .SingleOrDefault();
        var bid = split_string_remove_empty.Skip(5)
                                           .Take(1)
                                           .SingleOrDefault();
        var bonds = string.Join(" ", split_string_remove_empty.Skip(6)
                                                              .Reverse());

Output: Output data for above code

[EDIT] I did not account for the blank 'Offer' so this method will fail on a blank 'Offer'. Looks like someone already has a working answer, but i'll leave the linq example for anyone that finds it useful. [END EDIT]

基于LINQ的选项。

通过空格拆分字符串,并删除空空间。然后颠倒顺序,这样你就可以从后面开始,朝着你的方向前进。数据在字符串的结尾显得更加规范化。

对于每一个连续的部分,你跳过以前的选项,只拿你所需要的。对于最后一部分是长字符串,你跳过你不需要的东西,然后将订单恢复到正常,并加入段与空格。

        string test = "STACR 2015-HQA1 M1  125    120    5   x 1.5   0";

        var split_string_remove_empty = test.Split(new char[]{ ' ' }, StringSplitOptions.RemoveEmptyEntries).Reverse();

        var change = split_string_remove_empty.Take(1)
                                              .SingleOrDefault();
        var mm2 = split_string_remove_empty.Skip(1)
                                           .Take(1)
                                           .SingleOrDefault();
        var mm3 = split_string_remove_empty.Skip(3)
                                           .Take(1)
                                           .SingleOrDefault();
        var offer = split_string_remove_empty.Skip(4)
                                             .Take(1)
                                             .SingleOrDefault();
        var bid = split_string_remove_empty.Skip(5)
                                           .Take(1)
                                           .SingleOrDefault();
        var bonds = string.Join(" ", split_string_remove_empty.Skip(6)
                                                              .Reverse());

Output: Output data for above code

c#  regex  parsing